System and method having transparent composite model for transform coefficients

ABSTRACT

To better handle the flat tail phenomenon commonly seen in transform coefficients such as DCT coefficients, a system and method having a model dubbed a transparent composite model (TCM) are described. Given a sequence of transform coefficients, a TCM first separates the tail of the sequence from the main body of the sequence. A first distribution such as a uniform, truncated Laplacian, or truncated geometric distribution can be used to model transform coefficients in the flat tail while at least one parametric distribution (e.g. truncated Laplacian, generalized Gaussian (GG), and geometric distributions) can be used to model data in the main body. A plurality of boundary values can be used to bound a plurality of distribution models. The plurality of boundary values and other parameters of the TCM can be estimated via maximum likelihood (ML) estimation or greedy estimation.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. Provisional Patent Application Ser. No. 61/827,321 filed May 24, 2013 entitled TRANSPARENT COMPOSITE MODEL FOR DCT COEFFICIENTS: DESIGN AND ANALYSIS, the contents of which are hereby incorporated by reference into the Detailed Description of Example Embodiments.

TECHNICAL FIELD

Embodiments of the present invention generally relate to modeling of transform coefficients such as DCT coefficients, and in particular to methods and systems having transparent composite model for transform coefficients.

BACKGROUND

From its earlier adoption in JPEG to its recent application in HEVC (High Efficiency Video Coding), the newest video coding standard [3], the discrete cosine transform (DCT) has been widely applied in digital signal processing, particularly in lossy image and video coding. It has thus attracted, during the past few decades, a lot of interest in understanding the statistical distribution of DCT coefficients (see, for example, [1], [4], [7], [9], and references therein). Deep and accurate understanding of the distribution of DCT coefficients would be useful to quantization design [12], entropy coding, rate control [7], image understanding and enhancement [1], and image and video analytics [13] in general.

In the literature, Laplacian distributions, Cauchy distributions, Gaussian distributions, mixtures thereof, and generalized Gaussian (GG) distributions have all been suggested to model the distribution of DCT coefficients (see, for example, [2], [4], [9], and references therein). Depending on the actual image data sources used and the need to balance modeling accuracy and model's simplicity/practicality, each of these models may be justified to some degree for some specific application. In general, it is believed that in terms of modeling accuracy, GG distributions with a shape parameter and a scale parameter achieve the best performance [2][9]. However, parameter estimation for GG distributions is difficult and hence the applicability of the GG model to applications, particularly online applications, may be limited. On the other hand, the Laplacian model has been found to balance well between complexity and modeling accuracy; it has been widely adopted in image and video coding [12], although its modeling accuracy is significantly inferior to that of the GG model [2].

SUMMARY

To better handle the flat tail phenomenon commonly seen in DCT coefficients, a system and method is provided including a model dubbed a transparent composite model (TCM). Given a sequence of DCT coefficients, a TCM first separates the tail of the sequence from the main body of the sequence. Then, a uniform distribution is used to model DCT coefficients in the flat tail while a different parametric distribution (such as truncated Laplacian, generalized Gaussian (GG), and geometric distributions) is used to model data in the main body. The TCM is continuous if each DCT coefficient is regarded continuous (i.e., analog), and discrete if each DCT coefficient is discrete. The separate boundary and other parameters of the TCM can be estimated via maximum likelihood (ML) estimation. Efficient online algorithms with global convergence are developed to compute the ML estimates of these parameters. Analysis and experimental results show that for real-valued continuous AC coefficients, (1) the TCM with truncated GG distribution as its parametric distribution (GGTCM) offers the best modeling accuracy among pure Laplacian models, pure GG models, and the TCM with truncated Laplacian distribution as its parametric distribution (LPTCM), at the cost of extra complexity; and (2) LPTCM offers a modeling accuracy comparable to pure GG models, but with a lower complexity. On the other hand, for discrete/integer DCT coefficients, which are mostly seen in real-world applications of DCT, extensive experiments show via both the divergence test and Chi-square test that the discrete TCM with truncated geometric distribution as its parametric distribution (GMTCM) models AC coefficients more accurately than pure Laplacian models and GG models in majority cases while having simplicity and practicality similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a good capability of feature extraction—DCT coefficients in the flat tail identified by the GMTCM are truly outliers, and these outliers across all AC frequencies of an image represent an outlier image revealing some unique global features of the image. This, together with the low complexity of GMTCM, makes the GMTCM a desirable choice for modeling discrete/integer DCT coefficients in real-world applications, such as image and video coding, image understanding, image enhancement, etc.

To further improve modeling accuracy, the concept of TCM can be extended by further separating the main portion into multiple sub-portions and modeling each sub-portion by a different parametric distribution (such as truncated Laplacian, generalized Gaussian (GG), and geometric distributions). The resulting model is dubbed a multiple segment TCM (MTCM). In the case of general MTCMs based on truncated Laplacian and geometric distributions (referred to as MLTCM and MGTCM, respectively), a greedy algorithm is developed for determining a desired number of segments and for estimating the corresponding separation boundaries and other MTCM parameters. For bi-segment TCMs, an efficient online algorithm is further presented for computing the maximum likelihood (ML) estimates of the separation boundary and other parameters. Experiments based on Kullback-Leibler (KL) divergence and χ^2 test show that (1) for real-valued continuous AC coefficients, the bi-segment TCM based on truncated Laplacian (BLTCM) models AC coefficients more accurately than the LPTCM and GG model while having simplicity and practicality similar to those of LPTCM and pure Laplacian; and (2) for discrete (integer or quantized) DCT coefficients, the bi-segment TCM based on truncated geometric distribution (BGTCM) significantly outperforms the GMTCM and GG model in terms of modeling accuracy, while having simplicity and practicality similar to those of GMTCM. Also shown is that the MGTCM derived by the greedy algorithm further improves the modeling accuracy over BGTCM at the cost of more parameters and slight increase in complexity.

In accordance with an example embodiment, there is provided a method for modelling a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value; determining one or more parameters of a first distribution model for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values; determining parameters of at least one further distribution model for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values; and performing a device operation on at least part of a composite distribution model which is a composite of the first distribution model and the at least one further distribution model having the respective determined parameters.

In accordance with an example embodiment, there is provided a method for a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value which satisfies a maximum likelihood estimation between the set of transform coefficients and a composite distribution model which is a composite of a plurality of distribution models each for a subset of transform coefficients of the set bounded by each of the at least one boundary coefficient values; and performing a device operation on at least one of the subsets of transform coefficients.

In accordance with an example embodiment, there is provided a method for modelling a set of transform coefficients, the method being performed by a device and including: determining a boundary coefficient value; determining one or more parameters of a uniform distribution model for transform coefficients of the set the magnitudes of which are greater than the boundary coefficient value; determining parameters of a parametric distribution model for transform coefficients of the set the magnitudes of which are less than the boundary coefficient value; and performing a device operation on at least part of a composite distribution model which is a composite of the uniform distribution model and the parametric distribution model having the respective determined parameters.

In accordance with an example embodiment, there is provided a device, including memory, a component configured to access a set of transform coefficients, and a processor configured to execute instructions stored in the memory in order to perform any or all of the described methods.

In accordance with an example embodiment, there is provided a non-transitory computer-readable medium containing instructions executable by a processor for performing any or all of the described methods.

BRIEF DESCRIPTION OF THE DRAWINGS

Reference will now be made, by way of example, to the accompanying drawings which show example embodiments, in which:

FIG. 1 illustrates separate Histograms of two AC components in the 8×8 DCT block of the 512×512 Lenna image.

FIG. 2 illustrates detail of the Histograms of FIG. 1 of the flat tail phenomenon in the 512×512 Lenna image.

FIG. 3 illustrates the overall curves of the LPTCM and GGTCM for two AC components in the 8×8 DCT block of the 512×512 Lenna image.

FIG. 4 illustrates detail of the overall curves of FIG. 3 of the tails of the LPTCM and GGTCM for two AC components in the 8×8 DCT block of the 512×512 Lenna image.

FIG. 5 illustrates uniform quantization with deadzone.

FIG. 6 illustrates a test image set 1: From left to right, top-down, they are referred as ‘bird’, ‘boat’, ‘fish’, ‘couple/Cp’, ‘hill’, ‘lena’, ‘baboon/Bb’, ‘mountain/Bt’, and ‘pepper/Pp’, respectively.

FIG. 7 illustrates test image set 2: These images are referred as ‘B1’, ‘B2’, ‘B3’, ‘B4’, and ‘B5’, respectively.

FIG. 8 illustrates a test image set 3: These images are the first frame of four class-F sequences for HEVC screen content tests. The original file names, which also indicate the image resolution, are ‘SlideEditing_(—)1280×720’, ‘SlideShow_(—)1280×720’, ‘Chinaspeed_(—)1024×768’, and ‘BasketballDrillText_(—)832×480’, respectively. In the text, their names are abbreviated as ‘SE’, ‘SS’, ‘CS’ and ‘BbT’.

FIG. 9 illustrates an original image (top), inlier image (middle), and outlier image (bottom), with demonstration of the perceptual importance of outlier coefficients by the image of ‘terrace’.

FIG. 10 illustrates an original image (top), inlier image (middle), and outlier image (bottom), with demonstration of the perceptual importance of outlier coefficients by the image of ‘Lenna’.

FIG. 11 illustrates an original image (top), inlier image (middle), and outlier image (bottom), with demonstration of the perceptual importance of outlier coefficients by the image of ‘BbT’.

FIG. 12 illustrates Algorithm 1: Computing the ML estimate of (y_(c),b,θ) in a general continuous TCM, in accordance with an example embodiment.

FIG. 13 illustrates Algorithm 2: An iterative algorithm for computing the ML estimate of λ in a truncated Laplacian distribution, in accordance with an example embodiment.

FIG. 14 illustrates Algorithm 3: An iterative algorithm for computing the ML estimate λ_(K) of λ in a truncated geometric distribution, in accordance with an example embodiment.

FIG. 15 illustrates Algorithm 4: An algorithm for computing the ML estimate (b*, p*, λ*, K*) in the GMTCM, in accordance with an example embodiment.

FIG. 16 shows Table 1: Overall comparisons between the LPTCM and GG model for 9 images for continuous DCT coefficients.

FIG. 17 shows Table 2: Overall comparisons between the GMTCM and GG model for all images coded using JPEG with QF=100.

FIG. 18 shows Table 3: Overall comparisons between the GMTCM and GG model for all images coded using JPEG with QF=90.

FIG. 19 shows Table 4: Overall comparisons between the GMTCM and GG model for all images coded using JPEG with QF=80.

FIG. 20 shows Table 5: Overall comparisons between the GMTCM and GG model for all images coded using JPEG with QF=70.

FIG. 21 shows Table 6: The χ² distances by the GG model, GMTCM, and Laplacian model for all 63 ACs from JPEG-coded image ‘bird’ with QF=100.

FIG. 22 shows Table 7: The χ² distances by the GG model, GMTCM, and Laplacian model for all 63 ACs from JPEG-coded image ‘boat’ with QF=100.

FIG. 23 shows Table 8: The χ² distances by the GG model, GMTCM, and Laplacian model for all 63 ACs from JPEG-coded image ‘CS’ with QF=100.

FIG. 24 shows Table 9: The χ² distances by the GG model, GMTCM, and Laplacian model for all 63 ACs from JPEG-coded image ‘CS’ with QF=90.

FIG. 25 illustrates a block diagram of an example device, in accordance with an example embodiment.

FIG. 26 illustrates an example method for modelling a set of transform coefficients, in accordance with an example embodiment.

FIG. 27 illustrates the χ² scores and KL divergence scores by GGD, Laplace, GMTCM, BGTCM and MGTCM for the first 15 low-frequency ACs along the zigzag order from JPEG-coded image ‘boat’ with QF=100.

FIG. 28 illustrates the χ² scores and KL divergence scores by GGD, Laplace, GMTCM, BGTCM and MGTCM for the first 15 low-frequency ACs along the zigzag order from JPEG-coded image ‘lenna’ with QF=100.

FIG. 29 illustrates the χ² scores and KL divergence scores by GGD, Laplace, GMTCM, BGTCM and MGTCM for the first 15 low-frequency ACs along the zigzag order from JPEG-coded image ‘CS’ with QF=100.

FIG. 30 illustrates the χ² scores and KL divergence scores by GGD, Laplace, GMTCM, BGTCM and MGTCM for the first 15 low-frequency ACs along the zigzag order from JPEG-coded image ‘B5’ with QF=100.

FIG. 31 illustrates Algorithm 5: An algorithm for computing the ML estimate of (y _(c) ₁ , b₁, θ) in the bi-segment TCM, in accordance with an example embodiment.

FIG. 32 illustrates Algorithm 6: A greedy algorithm for estimating y_(c) ₁ , b₁, and λ₁ in MLTCM, in accordance with an example embodiment.

FIG. 33 illustrates Algorithm 7: A greedy algorithm for determining l and estimating y _(c), b, and λ in MLTCM, in accordance with an example embodiment.

FIG. 34 illustrates Algorithm 8: A greedy algorithm for estimating K₁, b₁, and λ₁ in MGTCM, in accordance with an example embodiment.

FIG. 35 illustrates Algorithm 9: A greedy algorithm for determining l and estimating K, b, and λ in MGTCM from u^(n), in accordance with an example embodiment.

FIG. 36 shows Table 10: Overall comparisons between the BLTCM and GG model for modeling 15 low frequency continuous DCT coefficients.

FIG. 37 shows Table 11: Overall comparisons between the LPTCM and GG model for modeling 15 low frequency DCT coefficients.

FIG. 38 shows Table 12: Overall comparisons between BGTCM and the GG model for modeling 15 low frequency DCT coefficients.

FIG. 39 shows Table 13: Overall comparisons between the GMTCM and GG model for modeling 15 low frequency DCT coefficients.

FIG. 40 shows Table 14: Overall comparisons between the MGTCM and BGTCM model for modeling low frequency DCT coefficients.

Similar reference numerals may be used in different figures to denote similar components.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

In accordance with an example embodiment, there is provided a method for modelling a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value; determining one or more parameters of a first distribution model for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values; determining parameters of at least one further distribution model for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values; and performing a device operation on at least part of a composite distribution model which is a composite of the first distribution model and the at least one further distribution model having the respective determined parameters.

In accordance with an example embodiment, there is provided a method for a set of transform coefficients, the method being performed by a device and including: determining at least one boundary coefficient value which satisfies a maximum likelihood estimation between the set of transform coefficients and a composite distribution model which is a composite of a plurality of distribution models each for a subset of transform coefficients of the set bounded by each of the at least one boundary coefficient values; and performing a device operation on at least one of the subsets of transform coefficients.

1 Introduction to TCM

Both Laplacian and GG distributions decay exponentially fast. However, in many cases it is observed herein that DCT coefficients have a relatively flat tail, which can not be effectively modeled by an exponentially decaying function (see FIGS. 1-4 and associated description herein below). Although the tail portion of DCT coefficients is insignificant statistically, it contains values of large magnitude, which arguably represent important features or information about the underlying image, and hence should be handled with care. Indeed, improvement on modeling the tail portion could lead to better coding performance, as shown in [7] in video coding, where a Cauchy distribution, which decays much slowly than Laplacian distributions, was used to derive a rate model and a distortion model for DCT coefficients in rate control for video coding, leading to a significant coding gain. However, the Cauchy model may not model the main portion of DCT coefficients effectively, and is in general inferior to the GG model in term of the overall modeling accuracy [4]. Therefore, in addition to balancing modeling accuracy and model's simplicity/practicality, a good model of DCT coefficients also needs to balance the main portion and tail portion of DCT coefficients.

To better handle the flat tail phenomenon in DCT coefficients, in this disclosure, we develop a model dubbed transparent composite model (TCM), in which the tail portion of DCT coefficients is modeled separately from the main portion of DCT coefficients by a first distribution, and the main portion is modeled instead by a different parametric distribution such as truncated Laplacian, GG, and geometric distributions. This composite model introduces a boundary parameter to control which model to use for any given DCT coefficient; it is marked as transparent because there is no ambiguity regarding which model (the first distribution model or at least one further distribution model) a given DCT coefficient will fall into once the TCM is determined. The TCM is continuous if each DCT coefficient is regarded continuous (i.e., analog), and discrete if each DCT coefficient is discrete.

The separate boundary and other parameters of the TCM can be estimated via maximum likelihood (ML) estimation. We further propose efficient online algorithms with global convergence to compute the ML estimates of these parameters. Analysis and experimental results show that for real-valued continuous AC coefficients, (1) the TCM with truncated GG distribution as its parametric distribution (GGTCM) offers the best modeling accuracy among pure Laplacian models, pure GG models, and the TCM with truncated Laplacian distribution as its parametric distribution (LPTCM), at the cost of extra complexity; and (2) LPTCM matches up to pure GG models in term of modeling accuracy, but with simplicity and practicality similar to those of pure Laplacian models, hence having the best of both pure GG and Laplacian models. On the other hand, for discrete/integer DCT coefficients, which are mostly seen in real-world applications of DCT, extensive experiments show via both the divergence test and Chi-square test that the discrete TCM with truncated geometric distribution as its parametric distribution (GMTCM) models AC coefficients more accurately than pure Laplacian models and GG models in majority cases while having simplicity and practicality similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a good capability of feature extraction. I.e., DCT coefficients in the flat tail identified by the GMTCM are truly outliers, and these outliers across all AC frequencies of an image represent an outlier image revealing some unique global features of the image. This, together with the simplicity of modeling and the low complexity of computing online the ML estimates of the parameters of the GMTCM, makes the GMTCM a desirable choice for modeling discrete/integer DCT coefficients in real-world applications, such as image and video coding, image understanding, image enhancement, etc.

2 DCT Models and the Flat Tail Phenomenon

This section first reviews briefly some relevant studies in the literature for modeling DCT coefficients. We then discuss the flat tail phenomenon in DCT coefficients.

2.1 Models in the Literature for DCT Coefficients

2.1.1 Gaussian Distributions

As Gaussian distributions are widely used in natural and social sciences for real-valued random variables, they have been naturally applied to model DCT coefficients [1]. The justification for the Gaussian model may come from the central limit theorem (CLT) [11], which states that the mean of a sufficiently large number of independent random variables will be approximately normally distributed. Consider the linear weighted summation nature of DCT. The CLT provides a meaningful guidance for modeling DCT coefficients with Gaussian distributions. A comprehensive collection of distributions based on Gaussian probability density function were studied in [8].

Although the Gaussian model is backed up by the CLT, it was observed that DCT coefficients for natural images/video usually possess a tail heavier than Gaussian distributions [2]. Consequently, generalized Gaussian distributions have been suggested for modeling DCT coefficients.

2.1.2 Generalized Gaussian Distributions

The DCT coefficients may be modeled with a generalized Gaussian distribution with zero mean, as follows

$\begin{matrix} {{f(y)} = {\frac{\beta}{2{{\alpha\Gamma}\left( {1/\beta} \right)}}{\mathbb{e}}^{- {({{y}/\alpha})}^{\beta}}}} & (1) \end{matrix}$ where α is a positive scale parameter, β defines a positive shape parameter, and Γ(•) denotes the gamma function.

It is easy to see that when β=1, the above GG distribution is de-generalized to a Laplacian distribution. When β=2, it becomes the Gaussian distribution with variance α²/2. With the free choice of the scale parameter α and the shape parameter β, the GG distribution has shown an effective way to parameterize a family of symmetric distributions spanning from Gaussian to uniform densities, and a family of symmetric distributions spanning from Laplacian to Gaussian distributions. As mentioned above, DCT coefficient distributions are observed to posses flat tails. In this regard, the GG distribution allows for either heavier-than-Gaussian tails with β<2, heavier-than-Laplacian tails with β<1, or lighter-than-Gaussian tails with β>2. As such, with this flexibility, the GG model outperforms in general both the Gaussian and Laplacian models in terms of modeling accuracy for modeling DCT coefficients.

Nevertheless, the benefit of accurate modeling by the GG model comes with some inevitable drawbacks. For example, the lack of closed-form cumulative distribution function (cdf) makes it difficult to apply the GG model in practice. Another main drawback is the high complexity for its parameter estimation. For example, given a sequence of samples Y_(i),i=1, . . ., n, the ML estimate of the shape parameter β is the root of the following equation [2],

$\begin{matrix} {{{{\frac{{\psi\left( {{1/\beta} + 1} \right)} + {\log(\beta)}}{\beta^{2}} + {\frac{1}{\beta^{2}}{\log\left( {\frac{1}{n}{\sum\limits_{i = 1}^{n}\;{Y_{i}}^{\beta}}} \right)}} - \frac{\sum\limits_{i = 1}^{n}\;{{Y_{i}}^{\beta}\log{Y_{i}}}}{\beta{\sum\limits_{i = 1}^{n}\;{Y_{i}}^{\beta}}}} = 0},\mspace{20mu}{where}}\mspace{20mu}{{\psi(\tau)} = {\gamma + {\int_{0}^{1}{\left( {1 - t^{\tau - 1}} \right)\left( {1 - t} \right)^{- 1}\ {\mathbb{d}t}}}}}} & (2) \end{matrix}$ and γ=0.577 . . . denotes the Euler constant. Clearly, the terms Σ_(i=1) ^(n) |Y_(i)|^(β) log |Y_(i)| and βΣ_(i=1) ^(n) |Y_(i)|^(β) yield a significant amount of computation when a numerical iterative solution of β is used.

2.1.3 Laplacian Distributions

Due to its ability to balance modeling accuracy and model's simplicity/practicality, the Laplacian model for DCT coefficients is the most popular one in use [10], [9]. A Laplacian density function with zero mean is given as follows,

$\begin{matrix} {{{f(y)} = {\frac{1}{2\lambda}{\mathbb{e}}^{- {({{y}/\lambda})}}}},} & (3) \end{matrix}$ where λ denotes a positive scale parameter. Given a sequence of samples Y_(i=)1, . . ., n, the ML estimate of λ can be easily computed as

$\begin{matrix} {\lambda = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\;{{Y_{i}}.}}}} & (4) \end{matrix}$ In addition, under the Laplacian distribution, the probability for an interval [L, H] with H>L≧0 can also be computed easily as

$\frac{1}{2}{\left( {{\mathbb{e}}^{- \frac{L}{\lambda}} - {\mathbb{e}}^{- \frac{H}{\lambda}}} \right).}$

2.1.4 Other Distributions

There are other distributions investigated in the literature for modeling DCT, [5], [7], [6], [8]. In [5], alpha-stable distributions were used to model DCT coefficients for watermark detection. As a special case of alpha-stable distributions, Cauchy distribution was used in [7] for modeling DCT coefficient in video coding. The alpha-stable distributions were reported to provide a satisfactory modeling accuracy for the corresponding image processing goals as in [5] and [7]. Yet, the lack of closed-form for the alpha-stable family distributions usually leads to difficulties for parameter estimation and a limited application for modeling DCT coefficients. In [6], a symmetric normal inverse Gaussian distribution was studied for modeling DCT coefficients, as follows:

$\begin{matrix} {{{{f(y)} = \frac{{A\left( {\delta,\alpha} \right)}{K_{1}\left( {\alpha\sqrt{\delta^{2} + y^{2}}} \right)}}{\sqrt{\delta^{2} + y^{2}}}},{where}}{{{K_{\lambda}(\xi)} = {\frac{1}{2}{\int_{0}^{\infty}{z^{\lambda - 1}{\exp\left( {{- \frac{1}{2}}{\xi\left( {z + z^{- 1}} \right)}} \right)}\ {\mathbb{d}z}}}}},{and}}{{A\left( {\delta,\alpha} \right)} = {\frac{\delta\alpha}{\pi}{{\exp({\delta\alpha})}.}}}} & (5) \end{matrix}$ This model was tested using the Kolmogorov-Simrnov test and reported with improved modeling accuracy over General Gaussian and Laplacian distributions using the Kolmogorov-Simrnov test. Yet, its complexity is still significantly more than that of a Laplacian model. Moreover, the Kolmogorov-Simrnov test is generally regarded as less preferable for measuring the modeling accuracy than the χ² test [2], and by χ² test, the best modeling accuracy is achieved by the GG distributions. The test statistics χ² is defined as

$\begin{matrix} {{??}^{2} = {\sum\limits_{i = 1}^{I}\;\frac{\left( {n_{i} - {n \cdot p_{i}}} \right)^{2}}{n \cdot p_{i}}}} & (6) \end{matrix}$ where I is the number of intervals into which the sample space is partitioned, n is the total number of samples, n_(i) denotes the number of samples in the ith interval, and p_(i) is the probability under the underlying theoretical model that a sample falls into the interval i.

Similar as in [2], this disclosure prefers the χ² test over the Kolmogorov-Simrnov for measuring the modeling accuracy. Besides the justification provided in [2] for using the χ² test, our preference also roots in the flat-tail phenomena of DCT coefficients. Specifically, χ² test better characterized a statistically insignificant tail portion in a distribution while the Kolmogorov-Simrnov test, which depends on a sample distribution function, tends to overlook the tail part. Nevertheless, the flat-tail phenomena has been widely observed for DCT coefficients, as in [5], [7]. In the following, more detailed discussions are present for the flat tail phenomena.

2.2 Flat Tails

Laplacian, Gaussian, and GG distributions all decay exponentially fast. As illustrated in FIGS. 1 and 2, however, DCT coefficients usually possess a much heavier tail. FIG. 1 was obtained by applying the floating-point type-II 8×8 DCT to the well-known 512×512 Lenna image, where the vertical bars show the histogram of the DCT coefficients. It is evident from FIG. 1 that the histogram of the DCT coefficients first decays quite rapidly for the main portion of DCT coefficients and then becomes relatively flat for the tail portion of DCT coefficients. Statistically, the tail portion of DCT coefficients is insignificant. However, it contains DCT coefficients of large magnitude, which usually have greater impacts on image quality, image features, quantization, etc. than other coefficients and hence deserve a better fit in modeling.

FIG. 2 zooms in the tail portion of FIG. 1 and further compares the histogram of DCT coefficients against the GG and Laplacian models, where the vertical bars again represent the histogram of DCT coefficients, and the two illustrated curves show results from the GG and Laplacian models, respectively. In FIG. 2, the ML estimates of the parameters of the GG model were computed via Matlab codes from [13] while the λ value of the Laplacian model was computed using (4). For both models, the χ² tests were performed to evaluate their respective modeling accuracy. According to the χ² test, the GG model significantly outperforms the Laplacian model. Furthermore, in each case of FIG. 2, the obtained shape parameter β is much smaller than 1, meaning that the resulting GG distribution possesses a tail heavier than that of the Laplacian distribution. In comparison with the real data histogram shown in FIG. 2, however, the GG model still suffers from an exponentially bounded tail, which is much lighter than that of the DCT coefficients.

The flat tail phenomenon in the Lenna image is widely observed in other images as well. As shown in [2], the estimated shape parameter β for the GG distribution for various images is less than 1 in most cases, indicating that the data distribution possesses a tail heavier than that of the Laplacian distribution. In [7], it was also observed that the tail of DCT coefficients in video coding is much heavier than that of the Laplacian distribution, and a Cauchy distribution was used instead for deriving rate and distortion models for DCT coefficients. However, as mentioned before, the Cauchy model may not model the main portion of DCT coefficients effectively, and is in general inferior to the GG model in term of the overall modeling accuracy [4]. Therefore, it is advantageous to have a model which can balance well the main portion and tail portion of DCT coefficients while having both simplicity and superior modeling accuracy.

3 Continuous Transparent Composite Model

To better handle the flat tail phenomenon in DCT coefficients, we now separate the tail portion of DCT coefficients from the main portion of DCT coefficients and use a different model to model each of them. Since DCT coefficients in the tail portion are insignificant statistically, each of them often appears once or a few times in the entire image or video frame. Hence it would make sense to model them separately by a uniform distribution while modeling the main portion by a parametric distribution such as truncated Laplacian, GG, and geometric distributions, yielding a model we call a transparent composite model. In this section, we assume that DCT coefficients are continuous (i.e. can take any real value), and consider continuous TCMs.

3.1 Description of General Continuous TCMs

Consider a probability density function (pdf) f(y|θ) with parameters θ∈Θ where θ could be a vector, and Θ is the parameter space. Let F(y|θ) be the corresponding cdf, i.e. F(y|θ)

∫_(−∞) ^(y) f(u|θ)du.

Assume that f(y|θ) is symmetric in y with respect to the origin, and F(y|θ) is concave as a function of y in the region y≧0. It is easy to verify that Laplacian, Gaussian, and GG distributions all satisfy this assumption. The TCM based on F (y|θ) is defined as

$\begin{matrix} {{p\left( {{y❘y_{c}},b,\theta} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} {\frac{b}{{2{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y❘\theta} \right)}} & {{{if}\mspace{14mu}{y}} < y_{c}} \\ \frac{1 - b}{2\left( {a - y_{c}} \right)} & {{{if}\mspace{14mu} y_{c}} < {y} \leq a} \\ {\max\left\{ {{\frac{b}{{2{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y_{c}❘\theta} \right)}},\frac{1 - b}{2\left( {a - y_{c}} \right)}} \right\}} & {{{if}\mspace{14mu}{y}} = y_{c}} \\ 0 & {otherwise} \end{matrix} \right.} & (7) \end{matrix}$ where 0≦b≦1, 0<d≦y_(c)<a, and a represents the largest magnitude a sample y can take. Here both a and d are assumed to be known. It is not hard to see that given (y_(c), b, θ), as a function of y, p(y|y_(c), b, θ) is indeed a pdf, and also symmetric with respect to the origin.

According to the TCM defined in (7), a sample y is generated according to the truncated distribution

$\frac{1}{{2{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y❘\theta} \right)}$ with probability b, and according to the uniform distribution

$\frac{1}{2\left( {a - y_{c}} \right)}$ (also called the outlier distribution) with probability 1-b. The composite model is transparent since given parameters (y_(c), b, θ), there is no ambiguity regarding which distribution a sample y≠±y_(c) comes from. At y=±y_(c), p(y|y_(c), b, θ) can be defined arbitrarily since one can arbitrarily modify the value of a pdf over a set of zero Lebesgue measure without changing its cdf. As shown later, selecting p(y|y_(c), b, θ) at y=±y_(c) to be the maximum of

$\frac{b}{{2{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y_{c}❘\theta} \right)}\mspace{14mu}{and}\mspace{14mu}\frac{1 - b}{2\left( {a - y_{c}} \right)}$ will facilitate our subsequent argument for ML estimation. Hereafter, samples from the outlier distribution will be referred to as outliers.

3.2 ML Estimate of TCM Parameters

In practice, parameters y_(c), b, θ are often unknown and hence have to be estimated, say, through ML estimation. Let Y₁ ^(n)=Y₁, Y₂, . . ., Y_(n) be a sequence of DCT coefficients in an image or in a large coding unit (such as a block, a slice or a frame in video coding) at a particular frequency or across frequencies of interest. Assume that Y₁ ^(n) behaves according to the TCM defined in (7) with Y_(max)

max{|Y_(i)|:1≦i≦n}<a and Y_(max)≧d. (When Y_(max)<d, there would be no outliers and the ML estimate of y_(c) and b is equal to d and 1, respectively.) We next investigate how to compute the ML estimate of y_(c), b and θ.

Given Y₁ ^(n) with d≦Y_(max)<a, let N ₁(y _(c))

{i:|Y _(i) |<y _(c)} N ₂(y _(c))

{i:y _(c) <|Y _(i)|} N ₃(y _(c))

{i:|Y _(i) |=y _(c)}.

Then the log-likelihood function g(y_(c),b,θ|Y₁ ^(n)) according to (7) is equal to

$\begin{matrix} {{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}\overset{1)}{=}{{{\left( {{N_{2}\left( y_{c} \right)}} \right)\left\lbrack {{\ln\left( {1 - b} \right)} - {\ln\; 2\left( {a - y_{c}} \right)}} \right\rbrack} + {\left( {{N_{1}\left( y_{c} \right)}} \right)\left\{ {{\ln\; b} - {\ln\left\lbrack {{2{F\left( {y_{c}❘\theta} \right)}} - 1} \right\rbrack}} \right\}} + {\sum\limits_{i \in {N_{1}{(y_{c})}}}\;{\ln\;{f\left( {Y_{i}❘\theta} \right)}}} + {{{N_{3}\left( y_{c} \right)}}\max\left\{ {{{\ln\left( {1 - b} \right)} - {\ln\; 2\left( {a - y_{c}} \right)}},{{\ln\; b} - {\ln\left\lbrack {{2{F\left( {y_{c}❘\theta} \right)}} - 1} \right\rbrack} + {\ln\;{f\left( {y_{c}❘\theta} \right)}}}} \right\}}} = {{{{N_{2}\left( y_{c} \right)}}{\ln\left( {1 - b} \right)}} + {{{N_{1}\left( y_{c} \right)}}\ln\; b} + {\sum\limits_{i \in {N_{1}{(y_{c})}}}\;{\ln\;{f\left( {Y_{i}❘\theta} \right)}}} + {{{N_{3}\left( y_{c} \right)}}\max\left\{ {{{\ln\left( {1 - b} \right)} - {\ln\; 2\left( {a - y_{c}} \right)}},{{\ln\; b} - {\ln\left\lbrack {{2{F\left( {y_{c}❘\theta} \right)}} - 1} \right\rbrack} + {\ln\;{f\left( {y_{c}❘\theta} \right)}}}} \right\}} - {{{N_{2}\left( y_{c} \right)}}\ln\; 2\left( {a - y_{c}} \right)} - {{{N_{1}\left( y_{c} \right)}}{\ln\left\lbrack {{2{F\left( {y_{c}❘\theta} \right)}} - 1} \right\rbrack}}}}} & (8) \end{matrix}$ where |S| denotes the cardinality of a finite set S, and the equality 1) is due to (7) and the fact that lnz is strictly increasing in the region z>0. Since F(y|θ) is nondecreasing with respect to y, it follows from (8) that for any Y_(max)<y_(c)<a,

$\begin{matrix} {{{{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)} \leq {{n\left\{ {{\ln\; b} - {\ln\left\lbrack {{2{F\left( {Y_{\max}❘\theta} \right)}} - 1} \right\rbrack}} \right\}} + {\sum\limits_{i = 1}^{n}\;{\ln\;{f\left( {Y_{i}❘\theta} \right)}}}} \leq {{g\left( {Y_{\max},b,{\theta ❘Y_{1}^{n}}} \right)}.\mspace{79mu}{Therefore}}},{{we}\mspace{14mu}{have}}}{{\max\left\{ {{{{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}\text{:}d} \leq y_{c} < a},{0 \leq b \leq 1},\theta} \right\}} = {\max{\left\{ {{{{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}\text{:}d} \leq y_{c} \leq Y_{\max}},{0 \leq b \leq 1},\theta} \right\}.}}}} & (9) \end{matrix}$

To continue, we now sort |Y₁|, |Y₂|, . . ., |Y_(n)| in ascending order into W₁≦W₂≦ . . . ≦W_(n). Note that W_(n)=Y_(max). Let m be the smallest integer i such that W_(i)≧d. Define I _(m)=(d,W _(m)) and for any m<i≦n, I _(i)=(W _(i−1) ,W _(i)).

Then it is easy to see that the interval [d, Y_(max)] can be decomposed as [d,Y _(max) ]={d,W _(m) ,W _(m+1) , . . .,W _(n)}∪(∪_(i=m) ^(n) I _(i)) which, together with (9), implies that

$\begin{matrix} {{\max\left\{ {{{{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}\text{:}d} \leq y_{c} < a},{0 \leq b \leq 1},\theta} \right\}} = {{\max\limits_{0 \leq b \leq 1}\;{\max\limits_{\theta}\;{\max\limits_{y_{c} \in {\lbrack{d,Y_{\max}}\rbrack}}\;{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}}}} = {\max\limits_{b,\theta}\;{\max\;{\left\{ {{g\left( {d,b,{\theta ❘Y_{1}^{n}}} \right)},{g\left( {W_{i},b,{\theta ❘Y_{1}^{n}}} \right)},{{\sup\limits_{y_{c} \in I_{i}}\;\left\lbrack {g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)} \right\rbrack}:{m \leq i \leq n}}} \right\}.}}}}} & (10) \end{matrix}$

Note that for any nonempty I_(i) with i>m, N_(i)(y_(c)) and N₂(y_(c)) remain the same and N₃(y_(c)) is empty for all y_(c)∈I_(i). Since by assumption F(y|θ) as a function of y is concave, it is not hard to verify that as a function of y_(c) −|N ₂(y _(c))| ln 2(a−y _(c))−|N ₁(y _(c))| ln [2F(y _(c)|θ)−1] is convex over y_(c)∈I_(i), and hence its value over y_(c)∈I_(i) is upper bounded by the maximum of its value at y_(c)=W_(i) and y_(c)=W_(i−1), i.e., the endpoints of I_(i). Therefore, in view of (8), we have

$\begin{matrix} {{\sup\limits_{y_{c} \in I_{i}}\left\lbrack {g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)} \right\rbrack} \leq {\max\;{\left\{ {{g\left( {W_{i - 1},b,{\theta ❘Y_{1}^{n}}} \right)},{g\left( {W_{i},b,{\theta ❘Y_{1}^{n}}} \right)}} \right\}.}}} & (11) \end{matrix}$

When I_(m) is nonempty, a similar argument leads to

$\begin{matrix} {{\sup\limits_{y_{c} \in I_{m}}\left\lbrack {g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)} \right\rbrack} \leq {\max{\left\{ {{g\left( {d,b,{\theta ❘Y_{1}^{n}}} \right)},{g\left( {W_{m},b,{\theta ❘Y_{1}^{n}}} \right)}} \right\}.}}} & (12) \end{matrix}$ Putting (10) to (12) together yields

$\begin{matrix} {{\max\left\{ {{{{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}\text{:}d} \leq y_{c} < a},{0 \leq b \leq 1},\theta} \right\}} = {\max\limits_{b,\theta}\;{\max{\left\{ {{g\left( {d,b,{\theta ❘Y_{1}^{n}}} \right)},{{{g\left( {W_{i},b,{\theta ❘Y_{1}^{n}}} \right)}\text{:}m} \leq i \leq n}} \right\}.}}}} & (13) \end{matrix}$ Therefore, the ML estimate of y_(c) is equal to one of d, W_(m), W_(m+1), . . . , W_(n).

We are now led to investigating

$\max\limits_{b,\theta}{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}$ for  each  y_(c) ∈ {d, W_(m), W_(m + 1), …  , W_(n)}.Let   ${N_{1}^{+}\left( y_{c} \right)}\overset{\bigtriangleup}{=}\left\{ {{i\text{:}\mspace{14mu}{Y_{i}}} \leq y_{c}} \right\}$ ${N_{2}^{+}\left( y_{c} \right)}\overset{\bigtriangleup}{=}{\left\{ {{i\text{:}\mspace{14mu} y_{c}} \leq {Y_{i}}} \right\}.}$

Further define

$\begin{matrix} {{g^{+}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}\overset{\bigtriangleup}{=}{{\left( {{N_{2}\left( y_{c} \right)}} \right)\left\lbrack {{\ln\left( {1 - b} \right)} - {\ln\; 2\left( {a - y_{c}} \right)}} \right\rbrack} + {\left( {{N_{1}^{+}\left( y_{c} \right)}} \right)\left\{ {{\ln\; b} - {\ln\left\lbrack {{2\;{F\left( {y_{c}❘\theta} \right)}} - 1} \right\rbrack}} \right\}} + {\sum\limits_{i \in {N_{1}^{+}{(y_{c})}}}^{\;}\;{\ln\;{f\left( {Y_{i}❘\theta} \right)}}}}} & (14) \\ {\mspace{79mu}{and}} & \; \\ {{{g^{-}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}\overset{\bigtriangleup}{=}{{\left( {{N_{2}^{+}\left( y_{c} \right)}} \right)\left\lbrack {{\ln\left( {1 - b} \right)} - {\ln\; 2\left( {a - y_{c}} \right)}} \right\rbrack} + {\left( {{N_{1}\left( y_{c} \right)}} \right)\left\{ {{\ln\; b} - {\ln\left\lbrack {{2\;{F\left( {y_{c}❘\theta} \right)}} - 1} \right\rbrack}} \right\}} + {\sum\limits_{i \in {N_{1}{(y_{c})}}}^{\;}\;{\ln\;{{f\left( {Y_{i}❘\theta} \right)}.}}}}}\;} & (15) \end{matrix}$

Note that the difference between g⁺(y_(c), b, θ|Y₁ ^(n)) and g⁻(y_(c), b, θ|Y₁ ^(n)) lies in whether or not we regard y_(c) itself as an outlier when y_(c) is equal to some W_(i). Comparing (8) with (14) and (15), we have

$\begin{matrix} {\mspace{79mu}{{{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)} = {\max\left\{ {{g^{+}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)},{g^{-}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}} \right\}}}\mspace{20mu}{{and}\mspace{14mu}{hence}}}} & (16) \\ {{{\max\limits_{b,\theta}{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}} = {{\max{\left\{ {{\max\limits_{b,\theta}{g^{+}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}},{\max\limits_{b,\theta}\;{g^{-}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}}} \right\}.\mspace{20mu}{{Let}\mspace{20mu}\left( {{b\left( y_{c} \right)},{\theta\left( y_{c} \right)}} \right)}}}\overset{\bigtriangleup}{=}{\arg\;{\max_{b,\theta}{g\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}}}}}\mspace{20mu}{\left( {{b^{+}\left( y_{c} \right)},{\theta^{+}\left( y_{c} \right)}} \right)\overset{\bigtriangleup}{=}{\arg\;{\max_{b,\theta}{g^{+}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}}}}\mspace{20mu}{\left( {{b^{-}\left( y_{c} \right)},{\theta^{-}\left( y_{c} \right)}} \right)\overset{\bigtriangleup}{=}{\arg\;{\max_{b,\theta}{{g^{-}\left( {y_{c},b,{\theta ❘Y_{1}^{n}}} \right)}.}}}}} & (17) \end{matrix}$

Then from (14) and (15), it is not hard to see that

$\begin{matrix} {{b^{+}\left( y_{c} \right)} = {{\frac{{N_{1}^{+}\left( y_{c} \right)}}{n}\mspace{20mu}{and}\mspace{14mu}{b^{-}\left( y_{c} \right)}} = \frac{{N_{1}\left( y_{c} \right)}}{n}}} & (18) \end{matrix}$ and θ⁺(y_(c)) and θ⁻(y_(c)) are the ML estimate of θ for the truncated distribution

$\frac{1}{\left. {{2\;{F\left( {y_{c}❘\theta} \right)}} - 1} \right)}{f\left( {y❘\theta} \right)}$ over the sample sets {Y_(i):i∈N₁ ⁺(y_(c))} and {Y_(i):i∈N₁(y_(c))}, respectively. In view of (17), one can then determine (b(y_(c)), θ(y_(c))) by setting

$\left( {{b\left( y_{c} \right)},{\theta\left( y_{c} \right)}} \right) = \left\{ \begin{matrix} \left( {{b^{+}\left( y_{c} \right)},{\theta^{+}\left( y_{c} \right)}} \right) & {{{if}\mspace{14mu}{g^{+}\left( {y_{c},{b^{+}\left( y_{c} \right)},\left. {\theta^{+}\left( y_{c} \right)} \middle| Y_{1}^{n} \right.} \right)}} \geq {g^{-}\left( {y_{c},{b^{-}\left( y_{c} \right)},\left. {\theta^{-}\left( y_{c} \right)} \middle| Y_{1}^{n} \right.} \right)}} \\ \left( {{b^{-}\left( y_{c} \right)},{\theta^{-}\left( y_{c} \right)}} \right) & {{{otherwise}.}\;} \end{matrix} \right.$

Finally, the ML estimate of (y_(c), b, θ) can be determined as y _(c)*=arg max_(y) _(c) _(∈{d,W) _(m) _(, . . . ,w) _(n) _(}) g(y _(c) ,b(y _(c)),θ(y _(c))|Y ₁ ^(n)) b*=b(y _(c)*) θ*=θ(y _(c)).  (20)

Summarizing the above derivations into Algorithm 1 (FIG. 12) for computing (y_(c)*,b*,θ*), we have proved the following result.

Theorem 1: The vector (y_(c)*, b*, θ*) computed by Algorithm 1 is indeed the ML estimate of (y_(c), b, θ) in the TCM specified in (7).

Remark 1: When implementing Algorithm 1 for a sequence {Y_(i)}_(i=)1^(n) of DCT coefficients with a flat tail as shown in FIGS. 1 and 2, one can choose a to be Y_(max) and apply Algorithm 1 to the sample set {Y_(i): |Y_(i)|<Y_(max), 1≦i≦n}. As for the selection of d>0, it follows from Algorithm 1 that the larger d is, the less computation Algorithm 1 would have. In our experiments, we have found that choosing d>0 such that n−m is around 20% of n is a good choice since the flat tail portion is normally not significant statistically and would contain less than 20% of the total samples.

Depending on whether or not Step 6 in Algorithm 1 can be implemented efficiently, the computation complexity of Algorithm 1 varies from one parametric family f(y|θ) to another. For some parametric family f(y|θ) such as Laplacian distributions, Step 6 can be easily solved and hence Algorithm 1 can be implemented efficiently. On the other hand, when f(y|θ) is the GG family, Step 6 is quite involved. In the next two subsections, we will examine Step 6 in two cases: (1) f(y|θ) is the Laplacian family, and the corresponding TCM is referred to as the LPTCM; and (2) f(y|θ) is the GG family, and the corresponding TCM is referred to as the GGTCM.

3.3 LPTCM

Plugging the Laplacian density function in (3) into (7), we get the LPTCM given by

$\begin{matrix} {{p\left( {{y❘y_{c}},b,\lambda} \right)}\overset{\bigtriangleup}{=}\left\{ \begin{matrix} {\frac{b}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}\frac{1}{2\;\lambda}{\mathbb{e}}^{{- {y}}/\lambda}} & {{{if}\mspace{14mu}{y}} < y_{c}} \\ \frac{1 - b}{2\left( {a - y_{c}} \right)} & {{{if}\mspace{14mu} y_{c}} < {y} \leq a} \\ {\max\left\{ {{\frac{b}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}\frac{1}{2\;\lambda}{\mathbb{e}}^{{- {y}}/\lambda}},\frac{1 - b}{2\left( {a - y_{c}} \right)}} \right\}} & {{{if}\mspace{14mu}{y}} = y_{c}} \\ 0 & {{otherwise}.} \end{matrix} \right.} & (21) \end{matrix}$

With reference to Step 6 in Algorithm 1, let S be either N₁ ⁺(y_(c)) or N₁(y_(c)). Then Step 6 in Algorithm 1 is equivalent to determining the ML estimate (denoted by λ_(y) _(c) ) of λ in the truncated Laplacian distribution

$\begin{matrix} {{p\left( {y❘\lambda} \right)}\overset{\bigtriangleup}{=}\left\{ \begin{matrix} {\frac{1}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}\frac{1}{2\;\lambda}{\mathbb{e}}^{{- {y}}/\lambda}} & {{{if}\mspace{14mu}{y}} \leq y_{c}} \\ 0 & {otherwise} \end{matrix} \right.} & (22) \end{matrix}$ from the sample set {Y_(i):i∈S}. Since |Y_(i)|≦y_(c) for any i∈S, the log-likelihood function of the sample set {Y_(i):i∈S} with respect to p(y|λ) is equal to

${L(\lambda)}\overset{\bigtriangleup}{=}{{- {{S}\left\lbrack {{\ln\; 2\;\lambda} + {\ln\left( {1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} \right)}} \right\rbrack}} - {\frac{1}{\lambda}{\sum\limits_{i \in S}^{\;}\;{{{Y_{i}}.{Then}}\mspace{14mu}{we}\mspace{14mu}{have}}}}}$ λ_(y_(c)) = arg  max_(0 ≤ λ ≤ ∞)L(λ).

It is not hard to verify that L(1/t) as a function of t>0 is strictly concave. Computing the derivative of L(λ) with respect to λ and setting it to 0 yields

$\begin{matrix} {{\lambda - \frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} - {\frac{1}{S}{\sum\limits_{i \in S}^{\;}\;{Y_{i}}}}} = 0.} & (23) \end{matrix}$ It can be shown (see the proof of Theorem 2 below) that

${s(\lambda)}\overset{\bigtriangleup}{=}{\lambda - \frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}}$ is a strictly increasing function of λ>0, and

$\begin{matrix} {{{\lim\limits_{\lambda\rightarrow 0^{+}}{s(\lambda)}} = {{0\mspace{14mu}{and}\mspace{14mu}{\lim\limits_{\lambda\rightarrow\infty}{s(\lambda)}}} = {\frac{y_{c}}{2}.{Let}}}}{C = {\frac{1}{S}{\sum\limits_{i \in S}^{\;}\;{{Y_{i}}.}}}}} & (24) \end{matrix}$

Then it follows that (1) when C=0, λ_(y) _(c=) 0, in which case the corresponding truncated Laplacian distribution is de-generated to a delta function; (2) when C≧y_(c)/2, λ_(y) _(c) =∞, in which case the corresponding truncated Laplacian distribution is de-generated to the uniform distribution over [−y_(c), y_(c)], and (3) when 0<C<y_(c)/2, λ_(y) _(c) is equal to the unique root to (23).

We are now led to solving (23) when 0<C<y_(c)/2. To this end, we developed the iterative procedure described in Algorithm 2 (FIG. 13).

Theorem 2 below shows that Algorithm 2 converges exponentially fast when 0<C<y_(c)/2.

Theorem 2: Assume that 0<C<y_(c)/2. Then λ_(i) computed in Step 9 of Algorithm 2 strictly increases and converges exponentially fast to λ_(y) _(c as i→∞.)

Proof: Define

${r(\lambda)} = {\lambda - \frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} - {C.}}$ It is not hard to verify that the derivative of r(λ) with respect to λ is

$\begin{matrix} {{r^{\prime}(\lambda)} = {{1 - {\frac{{\mathbb{e}}^{{- y_{c}}/\lambda}}{\left\lbrack {1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} \right\rbrack^{2}}\frac{y_{c}^{2}}{\lambda^{2}}}} > 0}} & (26) \end{matrix}$ for any λ>0. Therefore, r(λ) is strictly increasing over λ>0.

Since λ₀=C>0, it follows from (25) that λ₁>λ₀. In general, for any i≧1, we have

$\begin{matrix} \begin{matrix} {{\lambda_{i + 1} - \lambda_{i}} = {\frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda_{i}}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda_{i}}} - \frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda_{i - 1}}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda_{i - 1}}}}} \\ {= {y_{c}\left\lbrack {\frac{1}{{\mathbb{e}}^{y_{c}/\lambda_{i}} - 1} - \frac{1}{{\mathbb{e}}^{y_{c}/\lambda_{i - 1}} - 1}} \right\rbrack}} \end{matrix} & (27) \end{matrix}$ which implies that λ_(i+1)−λ_(i)>0 whenever λ_(i)−λ_(i−1)>0. By mathematic induction, it then follows that λ_(i) strictly increases as i increases.

We next show that all λ_(i), i≧1, are bounded. Indeed, it follows from (25) that

$\begin{matrix} {{r\left( \lambda_{i} \right)} = {\lambda_{i} - \frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda_{i}}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda_{i}}} - C}} \\ {= {\lambda_{i} - \lambda_{i + 1}}} \\ {< 0} \end{matrix}$ which, together with (26) and the fact that r(λ_(y) _(c) )=0, implies that λ_(i)<λ_(y) _(c) . Therefore λ_(i) converges as i→∞. Letting i→∞ in (25) yields

$\begin{matrix} {{\lim\limits_{i->\infty}\lambda_{i}} = {\lambda_{y_{c}}.}} & (28) \end{matrix}$

All remaining is to show that the convergence speed in (28) is exponentially fast. To this end, let

$\delta\overset{\Delta}{=}{\max\limits_{\lambda_{0} \leq \lambda \leq \lambda_{y_{c}}}{\frac{{\mathbb{e}}^{{- y_{c}}/\lambda}}{\left\lbrack {1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} \right\rbrack^{2}}{\left( \frac{y_{c}}{\lambda} \right)^{2}.}}}$

Then it follows from (26) that δ<1. This, together with (27), implies that λ_(i+1)−λ_(i)≦δ(λ_(i)−λ_(i−1)) for any i≧1, and hence λ_(i) converges to λ_(y) _(c) exponentially fast. This completes the proof of Theorem 2.

Plugging Algorithm 2 into Step 6 in Algorithm 1, one then gets an efficient algorithm for computing the ML estimate of (y_(c), b, λ) in the LPTCM. To illustrate the effectiveness of the LPTCM, the resulting algorithm was applied to the same DCT coefficients shown in FIG. 1. FIG. 3 shows the resulting LPTCM against the histogram of DCT coefficients on the whole in each respective case. FIG. 4 further zooms in the tail portion of FIG. 3. From FIGS. 3 and 4, it is clear that the LPTCM fits the histogram of DCT coefficients quite well and greatly improves upon the Laplacian model in each case. In comparison with the Laplacian model, it fits both the main and tail portions better. In terms of χ² values, it matches up to the GG model. More detailed comparisons will be presented in Section 5.

3.4 GGTCM

Plugging the GG density function in (1) into (7), we get the GGTCM given by

$\begin{matrix} {{p\left( {{y❘y_{c}},b,\alpha,\beta} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} {\frac{b\;\beta}{2{{\alpha\gamma}\left( {{1/\beta},\left( {y_{c}/\alpha} \right)^{\beta}} \right)}}{\mathbb{e}}^{- {({{y}/\alpha})}^{\beta}}} & {{{if}\mspace{14mu}{y}} < y_{c}} \\ \frac{1 - b}{2\left( {a - y_{c}} \right)} & {{{if}\mspace{14mu} y_{c}} < {y} \leq a} \\ {\max\left\{ {{\frac{b\;\beta}{2{{\alpha\gamma}\left( {{1/\beta},\left( {y_{c}/\alpha} \right)^{\beta}} \right)}}{\mathbb{e}}^{- {({{y}/\alpha})}^{\beta}}},\frac{1 - b}{2\left( {a - y_{c}} \right)}} \right\}} & {{{if}\mspace{14mu}{y}} = y_{c}} \\ 0 & {otherwise} \end{matrix} \right.} & (29) \end{matrix}$ where γ(s,x) is defined as γ(s,x)

∫₀ ^(x) t ^(s−1) e ^(−t) dt.

With reference to Algorithm 1, in this case, Step 6 in Algorithm 1 is equivalent to determining the ML estimate (denoted by (α_(y) _(c) , β_(y) _(c) )) of (α,β) in the truncated GG distribution

$\begin{matrix} {{p\left( {\left. y \middle| \alpha \right.,\beta} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} {\frac{\beta}{2{{\alpha\gamma}\left( {{1/\beta},\left( {y_{c}/\alpha} \right)^{\beta}} \right)}}{\mathbb{e}}^{- {({{y}/\alpha})}^{\beta}}} & {{{if}\mspace{14mu}{y}} \leq y_{c}} \\ 0 & {otherwise} \end{matrix} \right.} & (30) \end{matrix}$ from the sample set {Y_(i):i∈S}. Since |Y_(i)|≦y_(c) for any i∈S, the log-likelihood function of the sample set {Y_(i):i∈S} with respect to p(y |α,β) is equal to

${L\left( {\alpha,\beta} \right)}\overset{\Delta}{=}{{{{S}\left\lbrack {{\ln\;\beta} - {\ln\; 2\alpha} - {\ln\;{\gamma\left( {\frac{1}{\beta},\left( \frac{y_{c}}{\alpha} \right)^{\beta}} \right)}}} \right\rbrack} - {\sum\limits_{i \in S}{{\frac{Y_{i}}{\alpha}}^{\beta}.{{Therefore}\left( {\alpha_{y_{c}},\beta_{y_{c}}} \right)}}}} = {\arg\;{\max_{\alpha,\beta}{{L\left( {\alpha,\beta} \right)}.}}}}$

Computing the partial derivatives of L(α,β) with respect to α and β and setting them to zero yields

$\begin{matrix} \left\{ \begin{matrix} {\frac{1}{t} = {\beta\left\lbrack {{\frac{1}{S}{\sum\limits_{i \in S}{\frac{Y_{i}}{y_{c}}}^{\beta}}} + \frac{t^{{1/\beta} - 1}{\mathbb{e}}^{- t}}{\gamma\left( {{1/\beta},t} \right)}} \right\rbrack}} \\ {\beta = {{\ln\; t} - \frac{\int_{0}^{t}{y^{{1/\beta} - 1}{\mathbb{e}}^{- \gamma}\ln\; y{\mathbb{d}y}}}{\gamma\left( {{1/\beta},t} \right)} + {\frac{t\;\beta^{2}}{S}{\sum\limits_{i \in S}{{\frac{Y_{i}}{y_{c}}}^{\beta}\ln{\frac{Y_{i}}{y_{c}}}}}}}} \end{matrix} \right. & (31) \end{matrix}$ where t=(y _(c)/α)^(β). One can then take a solution to (31) as (α_(y) _(c) , β_(y) _(c) ).

Unlike the case of LPTCM, however, solving (31) does not seem to be easy. In particular, at this point, we do not know whether (31) admits a unique solution. There is no developed algorithm with global convergence to compute such a solution either even if the solution is unique. As such, Step 6 in Algorithm 1 in the case of GGTCM is much more complicated than that in the case of LPTCM.

Suboptimal alternatives are to derive approximate solutions to (31). One approach is to solve the two equations in (31) iteratively, starting with an initial value of β given by (2): (1) fix β and solve the first equation in (31); (2) fix α and solve the second equation in (31); and (3) repeat these two steps until no noticeable improvement can be made. Together with this suboptimal solution to (31), Algorithm 1 was applied to to the same DCT coefficients shown in FIG. 1. FIG. 3 shows the resulting GGTCM against the histogram of DCT coefficients on the whole in each respective case. We note that the resulting GGTCM improves on the GG model marginally, which may be due to the suboptimal solution to (31).

4 Discrete Transparent Composite Model

Though DCT in theory provides a mapping from a real-valued space to another real-valued space and generates continuous DCT coefficients, in practice (particularly in lossy image and video coding), DCT is often designed and implemented as a mapping from an integer-valued space (e.g., 8-bits pixels) to another integer-valued space and gives rise to integer DCT coefficients (e.g., 12-bits DCT coefficients in H.264). In addition, since most images and video are stored in a compressed format such as JPEG, H.264, etc., for applications (e.g., image enhancement, image retrieval, image annotation, etc.) based on compressed images and video, DCT coefficients are available only in their quantized values. Therefore, it is desirable to establish a good model for discrete (integer or quantized) DCT coefficients as well.

Following the idea of continuous TCM, in this section we develop a discrete TCM which partitions discrete DCT coefficients into the main and tail portions, and models the main portion by a discrete parametric distribution and the tail portion by a discrete uniform distribution. The particular discrete parametric distribution we will consider is a truncated geometric distribution, and the resulting discrete TCM is referred to as the GMTCM. To provide a uniform treatment for both integer and quantized DCT coefficients, we introduce a quantization factor of step size. Then both integer and quantized DCT coefficients can be regarded as integers multiplied by a properly chosen step size.

4.1 GMTCM

Uniform quantization with dead zone is widely used in image and video coding (see, for example, H.264 and HEVC). Mathematically, a uniform quantizer with dead zone and step size q is given by

${Q(X)} = {q \times {{sign}(X)} \times {{round}\left( \frac{{{X} - \left( {\Delta - {q/2}} \right)}}{q} \right)}}$ where q/2≦Δ<q. Its input-output relationship is shown in FIG. 5. Assume that the input X is distributed according to the Laplacian distribution in (3). Then the quantized index

${{sign}(X)} \times {{round}\left( \frac{{{X} - \left( {\Delta - {q/2}} \right)}}{q} \right)}$ is distributed as follows

$\begin{matrix} {{{p_{0} = {1 - {\mathbb{e}}^{- \frac{\Delta}{\lambda}}}}p_{i} = {\frac{1}{2}{{\mathbb{e}}^{- \frac{\Delta}{\lambda}}\left\lbrack {1 - {\mathbb{e}}^{- \frac{q}{\lambda}}} \right\rbrack}{\mathbb{e}}^{{- \frac{q}{\lambda}}{({{i} - 1})}}}},{i = {\pm 1}},{\pm 2},\ldots} & (32) \end{matrix}$

With the help of q, discrete (integer or quantized) DCT coefficients then take values of integers multiplied by q. (Hereafter, these integers will be referred to as DCT indices.) Note that p_(i) in (32) is essentially a geometric distribution. Using a geometric distribution to model the main portion of discrete DCT coefficients, we then get the GMTCM given by

$\begin{matrix} {\quad\left\{ \begin{matrix} {p_{0} = {b\; p}} & \; \\ {p_{i} = {{b\left( {1 - p} \right)}{\frac{1}{2}\left\lbrack {1 - {\mathbb{e}}^{- \frac{q}{\lambda}}} \right\rbrack}{{\mathbb{e}}^{{- \frac{q}{\lambda}}{({{i} - 1})}}/\left( {1 - {\mathbb{e}}^{{- \frac{q}{\lambda}}K}} \right)}}} & {{{{if}\mspace{14mu} i} = {\pm 1}},{\pm 2},\ldots\mspace{14mu},{\pm K}} \\ {p_{i} = \frac{1 - b}{2\left( {a - K} \right)}} & {{{if}\mspace{14mu} K} < {i} \leq a} \end{matrix} \right.} & (33) \end{matrix}$ where 0≦p≦1 is the probability of the zero coefficient, 0≦b≦1, 1≦K≦a, and a is the largest index in a given sequence of DCT indices. Here a is assumed known, and b, p, λ and K are model parameters.

4.2 ML Estimate of GMTCM Parameters

4.2.1 Algorithms

Let u^(n)=u₁, u₂, . . . , u_(n) be a sequence of DCT indices. Assume that u^(n) behaves according to the GMTCM defined by (33) with u_(max)

max{|u_(i)|: 1≦i≦n}≦a. We now investigate how to compute the ML estimate (b*,p*,λ*,K*) of (b,p,λ,K) from u^(n).

Let N ₀ ={j:u _(j)=0},N ₁(K)={j:0<|u _(j) |≦K}, and N ₂(K)={j:|u _(j) |>K}.

The log-likelihood function of u^(n) according to (33) is equal to

$\begin{matrix} {{G\left( {K,\lambda,b,p} \right)}\overset{\Delta}{=}{{{{N_{2}(K)}}{\ln\left( {1 - b} \right)}} + {\left( {{N_{0}} + {{N_{1}(K)}}} \right)\ln\; b} + {{N_{0}}\ln\; p} + {{{N_{1}(K)}}{\ln\left( {1 - p} \right)}} - {{{N_{2}(K)}}\ln\; 2\left( {a - K} \right)} + {{{N_{1}(K)}}\ln\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda}}}{2\left( {1 - {\mathbb{e}}^{{- \frac{q}{\lambda}}K}} \right)}} - {\frac{q}{\lambda}{\sum\limits_{j \in {N_{1}{(K)}}}{{\left( {{u_{j}} - 1} \right).\mspace{79mu}{Then}}\mspace{14mu}{we}\mspace{14mu}{have}}}}}} & (34) \\ {\mspace{79mu}{\left( {b^{*},p^{*},\lambda^{*},K^{*}} \right) = {\arg\;{\max_{b,p,\lambda,K}{{G\left( {K,\lambda,b,p} \right)}.}}}}} & (35) \end{matrix}$

For any 1≦K≦a, let

${L\left( {K,\lambda} \right)}\overset{\Delta}{=}{{{{N_{1}(K)}}\ln\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda}}}{2\left( {1 - {\mathbb{e}}^{{- \frac{q}{\lambda}}K}} \right)}} - {\frac{q}{\lambda}{\sum\limits_{j \in {N_{1}{(K)}}}\left( {{u_{j}} - 1} \right)}}}$ ${{and}\left( {{b(K)},{p(K)},\lambda_{K}} \right)}\overset{\Delta}{=}{\arg\;{\max_{b,p,\lambda}{{G\left( {K,\lambda,b,p} \right)}.}}}$

In view of (34), one can verify that

$\begin{matrix} {{{b(K)} = \frac{{N_{0}} + {{N_{1}(K)}}}{n}}{{p(K)} = \frac{N_{0}}{{N_{0}} + {{N_{1}(K)}}}}{{{and}\mspace{14mu}{whenever}\mspace{14mu} K} > 1},{\lambda_{K} = {\arg\;{\max_{0 \leq \lambda \leq \infty}{{L\left( {K,\lambda} \right)}.}}}}} & (36) \end{matrix}$

When K=1, G(K,λ,b,p) does not depend on λ and hence λ₁ can selected arbitrarily.

We are now led to determining λ_(K) for each 1<K≦a. At this point, we invoke the following lemma, which is proved in Appendix A (below).

Lemma 1: Let

${g(t)}\overset{\Delta}{=}{\frac{{\mathbb{e}}^{- t}}{1 - {\mathbb{e}}^{- t}} - {\frac{K\;{\mathbb{e}}^{{- K}\; t}}{1 - {\mathbb{e}}^{{- K}\; t}}.}}$

Then for any

${1 < K \leq a},{L\left( {K,\frac{q}{t}} \right)}$ as a function of t>0 is strictly concave, and for any K>1, g(t) is strictly decreasing over t∈(0,∞), and

${{\lim\limits_{t->0^{+}}{g(t)}} = {{\frac{K - 1}{2}\mspace{20mu}{and}\mspace{20mu}{\lim\limits_{t->\infty}{g(t)}}} = 0}},$

Computing the derivative of L(K,λ) with respect to λ and setting it to 0 yields

$\begin{matrix} {{{\frac{{\mathbb{e}}^{{- q}/\lambda}}{1 - {\mathbb{e}}^{{- q}/\lambda}} - {K\frac{{\mathbb{e}}^{{- K}\;{q/\lambda}}}{1 - {\mathbb{e}}^{{- K}\;{q/\lambda}}}} - C} = 0}{where}{C = {\frac{1}{{N_{1}(K)}}{\sum\limits_{j \in {N_{1}{(K)}}}{\left( {{u_{j}} - 1} \right).}}}}} & (37) \end{matrix}$

In view of Lemma 1, then it follows that (1) when C=0, λ_(K)=0; (2) when

${C \geq \frac{K - 1}{2}},{{\lambda_{K} = \infty};}$ and (3) when

${0 < C < \frac{K - 1}{2}},$ is the unique solution to (37). In Case (3), the iterative procedure described in Algorithm 3 (FIG. 14) can be used to find the unique root of (37).

Combining the above derivations together, we get a compete procedure for computing the ML estimate (b*, p*, λ*, K*) of (b, p, λ, K) in the GMTCM, which is described in Algorithm 4 (FIG. 15).

Remark 2: When implementing Algorithm 4 for actual DCT indices {u_(i): 1≦i≦n} with flat tail, there is no need to start Algorithm 4 with K=1. Instead, one can first choose K₀ such that |N₂ (K₀)| is a fraction of n and then run Algorithm 4 for K∈[K₀, a]. In our experiments, we have found that choosing K₀ such that |N₂(K₀)| is around 20% of n is a good choice.

4.2.2 Convergence and Complexity Analysis

In parallel with Algorithm 2, Algorithm 3 also converges exponentially fast when 0<C<(K−1)/2. In particular, we have the following result, which is proved in Appendix B (below).

Theorem 3: Assume that 0<C<(K−1)/2. Then λ^((i)) computed in Step 12 of Algorithm 3 strictly increases and converges exponentially fast to λ_(K) as i→∞.

The complexity of computing the ML estimate of the GMTCM parameters comes from two parts. The first part is to evaluate the cost of (34) over a set of K. The second part is to compute λ_(K) for every K using the Algorithm 3. Note that C in Algorithm 3 can be easily pre-computed for interesting values of K. Thus, the main complexity of Algorithm 3 is to evaluate the two simple equations in (38) for a small number of times in light of the exponential convergence, which is generally negligible. Essentially, the major complexity for the parameter estimation by Algorithms 3 and 4 is to collect the data histogram {h_(j),j=1, . . . , a} once. Compared with the complexity of solving (2) for GG parameters, where the data samples and the parameters to be estimated are closely tied together as in the Σ_(i=1) ^(n)|x_(i)|^(β) log |x_(i)| term and the βΣ_(i=1) ^(n)|x_(i)|^(β) term, the complexity of parameter estimation in the case of GMTCM is significantly lower.

Remark 3: In our discussion on TCMs for DCT coefficients so far, DCT coefficients are separated into two portions: the main portion and tail portion. As will be apparent to one skilled in the art, the main portion could be further separated into multiple sub-portions with each sub-portion modeled by a different parametric distribution. The resulting TCM would be called a multiple segment TCM (MTCM), described in greater detail below. In addition, the tail portion could be modeled by another parametric distribution such as a truncated Laplacian, GG, or geometric distribution as well since a uniform distribution is a de-generated Laplacian, GG, or geometric distribution.

Remark 4: Although we have used both continuous and discrete DCT coefficients as our data examples, all TCM models discussed so far are applied equally well to other types of data such as wavelet transform coefficients, prediction residuals arising from prediction in predictive coding and other prediction applications, and data which is traditionally modeled by Laplacian distributions.

5 Experimental results on Tests of Modeling Accuracy

This section presents experimental results obtained from applying TCMs to both continuous and discrete DCT coefficients and compare them with those from the Laplacian and GG models.

5.1 Test Materials and Performance Metric

Two criteria are applied in this disclosure to test modeling accuracy: the χ² test, as defined in (6), and the divergence distance test defined as follows

$\begin{matrix} {{d = {\sum\limits_{i = 1}^{I}{p_{i}\ln\frac{p_{i}}{q_{i}}}}},} & (40) \end{matrix}$ where I is the number of intervals into which the sample space is partitioned in the continuous case or the alphabet size of a discrete source, p_(i) represents probabilities observed from the data, and q_(i) stands for probabilities obtained from a given model. Note that p_(i)=0 is dealt with by defining 0 ln 0=0.

Three sets of testing images are deliberately selected to cover a variety of image content. The first set, as shown in FIG. 6, includes 9 standard 512×512 images with faces, animals, buildings, landscapes, etc, referred to as, from left to right and row by row, ‘bird’, ‘boat’, ‘fish’, ‘couple/Cp’, ‘hill’, ‘lenna’, ‘baboon/Bb’, ‘mountain/Bt’, and ‘pepper/Pp’, respectively. The second set, as shown in FIG. 7, has five high definition (1920×1080) frames selected from the first frame of each class-B sequences used for HEVC standardization tests [3], and referred to as, from left to right, ‘B1’, ‘B2’, ‘B4’, and ‘B5’, respectively. The third set, as shown in FIG. 8, is taken from the first frame of four class-F sequences used for HEVC screen content tests, and referred to as, from left to right, ‘SE’, ‘SS’, ‘CS’, and ‘BbT’, respectively.

Tests for continuous DCT coefficients were conducted by computing 8×8 DCT using floating point matrix multiplication. In our tests for discrete DCT coefficients, a raw image was first compressed using a Matlab JPEG codec with various quality factors (QF) ranging from 100, 90, 80, to 70; the resulting quantized DCT coefficients and corresponding quantization step sizes were then read from obtained JPEG files.

Tests were carried out for five different models: the Laplacian model, GG model, GGTCM, LPTCM, and GMTCM. Due to its high computation complexity, GGTCM was applied only to continuous DCT coefficients. On the other hand, GMTCM is applicable only to discrete coefficients. The Laplacian and GG models were applied to both continuous and discrete DCT coefficients; the same parameter estimation algorithms, (4) for the Laplacian model and (2) for the GG model, were used for both continuous and discrete DCT coefficients.

5.2 Overall Comparisons for Each Image

In the continuous case, the GGTCM outperforms the GG model, the LPTCM outperforms the Laplacian model, and the GG models outperforms the Laplacian model in general, as one would expect. An interesting comparison in this case is between the GG model and LPTCM. Table 1 (FIG. 16) shows the percentage w_(χ) ₂ of frequencies among 63 AC positions that are in favor of the LPTCM over the GG model for each of 9 images in Set 1 in terms of the χ² metric. For example, for the image ‘bird’, in terms of the χ² metric, the LPTCM is better than the GG model for 62 out of 63 frequencies; for the image ‘lenna’, the LPTCM is better than the GG model for 42 out of 63 frequencies. Overall, it would be fair to state that the LPTCM and GG model behave similarly in terms of modeling accuracy. And yet, the LPTCM has much lower computation complexity than the GG model.

In the discrete case, comparisons were conducted among the GMTCM, GG model, and Laplacian model in terms of both the divergence distance and χ² value. As expected, the GMTCM is always better than the Laplacian model according to both the divergence distance and χ² value, and hence the corresponding results are not included here. For the comparison between the GMTCM and GG model, results are shown in Tables 2, 3, 4, and 5 for quantized DCT coefficients from JPEG coded images with various QFs, where w_(d) stands for the percentage of frequencies among all tested AC positions that are in favor of the GMTCM over the GG model in terms of the divergence distance, and w_(χ) ₂ has a similar meaning but in terms of the χ² value. In Tables 2, all 63 AC positions were tested; in Tables 3, 4, and 5, all AC positions with 6 or more different non-zero AC coefficient magnitudes were tested. These tables show that when all quantization step sizes are 1, corresponding to QF=100, the comparison between the GMTCM and GG model is similar to that between the LPTCM and GG model, i.e., their performances are close to each other. However, with quantization step sizes increasing, the GMTCM starts to outperform the GG model significantly, as shown in Tables 3, 4, and 5, for all tested images.

5.3 Comparisons of χ² Among Three Models for Individual Frequencies

In the above overall comparisons, Table 2 (FIG. 17) shows that the GMTCM and GG model are close, while the GMTCM wins the majority over the GG model for all other cases as shown in Tables 3-5 (FIGS. 18-20). We now zoom in to look at the χ² values for all tested frequency positions for several representative images: (1) ‘bird’ which is strongly in favor of the GMTCM in Table 2; (2) ‘CS’ which is strongly in favor of the GG model in Table 2; and (3) ‘boat’ for which the GMTCM and GG model tie more or less in Table 2. The respective χ² values are presented in Tables 6, 7, and 8, respectively. Table 6 (FIG. 21) shows the x^2 distances by the GG model, GMTCM, and Laplacian model for all 63 ACs from JPEG-coded image ‘bird’ with QF=100. Table 7 (FIG. 22) shows the χ^2 distances by the GG model, GMTCM, and Laplacian model for all 63 ACs from JPEG-coded image ‘boat’ with QF=100. Table 8 (FIG. 23) shows the χ^2 distances by the GG model, GMTCM, and Laplacian model for all 63 ACs from JPEG-coded image ‘CS’ with QF=100.

From Tables 6, 7, and 8, it is fair to say that (1) the GMTCM dramatically improves the modeling accuracy over the Laplacian model; (2) when the GMTCM is better than the GG model, χ_(GMTCM) ² is often much smaller, up to 15658 tunes smaller, than χ_(GGD) ²; and (3) when the GG model is better than the GMTCM, the difference between χ_(GMTCM) ² and χ_(GGD) ² is not as significant as one would see in Case (2)—for example, in Table 8, χ_(GGD) ² is only up to 9 times smaller than χ_(GMTCM) ².

Another interesting result is observed in Table 9 (FIG. 24), which shows the χ² values for JPEG coded ‘CS’ image with QF=90. Compared with the case where the source is JPEG coded with smaller step size QF=100 as shown in Table 8, most ACs now show better modeling accuracy by the GMTCM than by the GG model when the quantization step size increases.

6 Applications

This section briefly discusses applications of TCM in various areas such as data compression and image understanding. For example, as shown in FIG. 25, a computer device 2500 can be used to implement the methods of example embodiments described herein. The computer device 2500 can include a controller 2502 or processor operably coupled to, for example, a memory 2504, a communication subsystem 2506, a display 2508, and other input or output devices 2510. The controller can include modules configured to implement an encoder 2512 and/or a decoder 2514, in accordance with example embodiments. The communication subsystem 2506 can be used to access DCT coefficients stored in a second device or server, for example. The communication subsystem 2506 can be used to send communications to another device.

6.1 Data Compression

As DCT is widely used in image/video compression, e.g. in JPEG, H.264, and HEVC, an accurate model for DCT coefficients would be helpful to further improvement in compression efficiency, complexity, or both in image/video coding.

6.1.1 Lossless Coding Algorithm Design

Entropy coding design in image and video coding such as JPEG, H.264 and HEVC is closely related to understanding the DCT coefficient statistics, due to the wide application of DCT in image and video compression. The superior modeling accuracy by TCM has been utilized by us to design an entropy coding scheme for discrete DCT coefficients (such as in JPEG images). Specifically, GMTCM parameters are calculated and coded for each frequency. Then, a bit-mask is coded to identify outliers, so that outliers and DCT coefficients within the main portion can be further coded separately with their respective context modeling. For DCT coefficients within the main portion, parameters of the truncated geometric distributions are encoded and then used to further improve the coding efficiency. In spite of the overhead for coding outliers flags, the new entropy codec shows on average 25% rate saving when compared with a standard JPEG entropy codec for high fidelity JPEG images (with quantization step size being 1 for most low frequency AC positions), which are significantly better than other state-of-the-art lossless coding methods for DCT coefficients [15] and for gray-scale images [16]. A suitable decoder can implement at least some or all of the functions of the encoder, as an inverse.

6.1.2 Lossy Coding Algorithm Design

Quantization design, as the core of lossy coding, roots in the rate distortion theory, which generally requires a statistic model to provide guidance to practical designs. Quantization design in DCT-based image and video coding usually assumes a Laplacian distribution due to its simplicity and fair modeling accuracy [12]. Since the LPTCM improves dramatically upon the Laplacian model in terms of modeling accuracy while having similar simplicity, it has been applied by us in to design quantizers for DCT coefficients and a DCT-based non-predictive image compression system, which is significantly better than JPEG and the state-of-the-art DCT-based non-predictive image codec [14] in terms of compression efficiency and compares favorably with the state-of-the-art DCT-based predictive codecs such as H.264/AVC intra coding and HEVC intra coding in high rate cases in terms of the trade-off between compression efficiency and complexity.

For example, as shown in FIG. 25, the controller 2502 can be configured to implement an encoder 2512. For example, an image/video encoder can include three steps: forward DCT (FDCT), quantization, and lossless encoding. The encoder first partitions an input image into 8×8 blocks and then processes these 8×8 image blocks one by one in raster scan order. Each block is first transformed from the pixel domain to the DCT domain by an 8×8 FDCT. A TCM model having respective parameters is generated which models the DCT coefficients. The resulting DCT coefficients are then quantized based on the determined TCM model. In an example embodiment, as shown, the quantization can be optimized using the determined TCM model. The DCT indices from the quantization are encoded in a lossless manner, for example, based on the determined TCM model again. The encoded DCT indices along with parameters of the determined TCM model are finally either saved into a compressed file or sent to the decoder. If the original input image is a multiple component image such as an RGB color image, the pipeline process of FDCT, quantization, and lossless encoding is conceptually applied to each of its components (such as its luminance component Y and chroma components Cr and Cb in the case of RGB color images) independently.

6.2 Image Understanding

Image understanding is another application for DCT coefficient modeling. It is interesting to observe that in natural images the statistically insignificant outliers detected by the GMTCM carry perceptually important information, which shed lights into DCT-based image analysis.

6.2.1 Featured Outlier Images Based on GMTCM

One important parameter in the GMTCM model is the cutting point y_(c)=Kq between a parametric distribution for the main portion and the uniform distribution for the flat tail portion. Statistically, the outlier coefficients that fall beyond y_(c) into the tail portion are not significant—although the actual number of outliers varies from one frequency to another and from one image to another, it typically ranges in our experiments from less than 0.1% of the total number of AC coefficients to 19% with an average around 1.2%. However, from the image quality perception perspective, the outliers carry very important information, as demonstrated by FIGS. 9, 10 and 11.

FIGS. 9-11 each include an original image, a so-called inlier image, and a so-called outlier image. An inlier image is generated by first forcing all outlier coefficients to zero and then performing the inverse DCT. An outlier image, on the other hand, is generated by first keeping only outliers, forcing all other DCT coefficients to zero, and then performing the inverse DCT. Three original images are taken from the three test sets with one from each set to show the perceptual importance of their respective outliers.

As the inlier image contains all DC components and inlier AC components, a down-sizing operation would impact our perception on the difference between the original image and the inlier image. Hence, FIGS. 9-11 are presented in a possibly large size. In FIG. 9, the outlier image captures most structural information as the railing. In FIG. 10, the outlier image shows a fine sketch of the face, while the inlier image with all statistically significant coefficients shows an undesired quality, particularly with the blurring of eyes. In FIG. 11, the basketball net is well sketched in the outlier image, but is much blurred in the inlier image. From these figures, it is evident that the tail portion is perceptually important. This, together with the statistical insignificance of outliers, makes the outlier image appealing to image understanding. On one hand, compared with the original image, the outlier image achieves dramatic dimension reduction. On the other hand, due to the preservation of perceptually important global information of the original image in the outlier image, some aspects of image understanding can be carried out instead from the outlier image with perhaps better accuracy and less computation complexity.

It is interesting to show the information rate for outliers, i.e., how many bits are needed to represent outlier images. We have also applied TCM to enhance entropy coding design for DCT coefficients, where outliers are encoded separately from inliers. It is observed that outliers only consume about 5% of the total bits.

Finally, it is worthwhile to point out that outlier images are related to, but different from conventional edge detection. An outlier image captures some global uniqueness in an image, while edges are usually detected based on local irregularity in the pixel domain. For example, the large area of vertical patterns on the left-top corner of FIG. 9 is not captured as outliers because those vertical patters repeat themselves many times in the image, while it shows up as edges.

6.2.2 Image Similarity

Similarity measurement among images plays a key role in image management, which attracts more and more attention in industry nowadays due to the fast growth of digital photography in the past decade. One application of DCT models is to measure the similarity among images by estimating the model parameters of different images and calculating a distribution distance. Because DCT coefficients well capture some spatial patterns in the pixel domain, e.g., AC₁ reflecting a vertical pattern and AC₈ preserving a horizontal pattern, the distribution distance between DCT coefficient models well represents the similarity between two images. Apparently, this type of similarity measurement roots in data histogram. Yet, in practice, histogram is not a good choice to be used, as it requires a flat overhead. This is particularly problematic for a large scale image management system. On the other hand, model-based distribution distances use only a few parameters with negligible overhead, thus providing a good similarity measurement between digital images particularly when the modeling accuracy is high. The inventors have studied along this line to use the GMTCM for image similarity and show promising performance.

The outlier images shown and discussed in Subsection 6.2.1 can be used to further enhance image similarity testing based on model-based distribution distances. Since outliers are insignificant statistically, their impact on model-based distribution distances may not be significant. And yet, if two images look similar, their respective outlier images must look similar too. As such, one can build other metrics based on outlier images to further enhance image similarity testing. In addition, an outlier image can also be used to detect whether a given image is scenic and to help improving face detection. These and other applications using the GMTCM are contemplated as being within the scope of the present disclosure.

Reference is now made to FIG. 26, which shows an example method 2600 for modelling a set of transform coefficients, for example implemented by the device 2500 (FIG. 25) or a plurality of devices, in accordance with an example embodiment. At event 2602, the method 2600 includes determining at least one boundary coefficient value, for example using maximum likelihood estimation 2610. At event 2604, the method 2600 includes determining one or more parameters of a first distribution model, for example a uniform distribution model or other model, for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values. At event 2606, the method 2600 includes determining parameters of at least one further distribution model, such as at least one parametric distribution model, for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values. The events 2602, 2604, 2606 are illustrated with double-arrows because of the co-dependence between the variables or parameters of the events 2602, 2604, 2606, including any iterative processes.

From the method 2600, a composite distribution model can be defined as a composite of the first distribution model (e.g. uniform distribution model) and the at least one further distribution model having the respective determined parameters. At event 2608, the method 2600 includes performing a device operation on at least part of the composite distribution model. For example, the device operation may be implemented on one of the distribution models but not the others. In an example embodiment, the device operation is performed on the entire composite distribution model.

In an example embodiment, the at least one parametric distribution model includes at least one of: a Laplacian distribution model, a generalized Gaussian model, and a geometric distribution model.

Referring to event 2608, in some example embodiments, the device operation includes at least one of storing on a memory, transmitting to a second device, transmitting to a network, outputting to an output device, displaying on a display screen, improving data compression of the set of transform coefficients using the composite distribution model, determining image similarity between different images by comparing at least part of the composite distribution model, determining a goodness-of-fit between the composite distribution model and the set of transform coefficients, and generating an identifier which associates the composite distribution model with the set of transform coefficients.

The set of transform coefficients includes: discrete cosine transform coefficients, Laplace transform coefficients, Fourier transform coefficients, wavelet transform coefficients, prediction residuals arising from prediction in predictive coding and other prediction applications, or data which is traditionally modeled by Laplacian distributions. The set of transform coefficients can be generated in real-time (e.g. from a source image or media file), obtained from the memory 2504 (FIG. 25) or from a second device.

Reference is still made to FIG. 26, which illustrates another example method 2600 for a set of transform coefficients using TCM, for example implemented by the device 2500 (FIG. 25), in accordance with an example embodiment. In an example embodiment, the method 2600 can be used to filter the set of transform coefficients which are bounded by one of the boundary coefficient values, for example.

At event 2610, the method 2600 includes determining at least one boundary coefficient value which satisfies a maximum likelihood estimation between the set of transform coefficients and a composite distribution model which is a composite of a plurality of distribution models each for a subset of transform coefficients of the set bounded by each of the at least one boundary coefficient values. This can include determining one or more parameters of a first distribution model, for example at least one uniform distribution model, for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values. This can include determining parameters of at least one further distribution model, such as at least one parametric distribution model, for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values. The maximum likelihood estimation at event 2610 is illustrated with double-arrows because of the co-dependence between the variables or parameters between the at least one boundary coefficient value and the distribution models.

At event 2608, the method 2600 includes performing a device operation on at least one of the subsets of transform coefficients. In some example embodiments, the device operation on the at least one of the subsets of transform coefficients includes at least one of: encoding, storing on a memory, transmitting to a second device, transmitting to a network, outputting to an output device, decoding, displaying a decoded version on a display screen, determining image similarity between different images by comparison of the at least one of the subsets of discrete transform coefficients, and generating an identifier which associates the composite distribution model with the at least one of the subsets of discrete transform coefficients.

Still referring to event 2608, the device operation on the subset of coefficients can be used to filter an image using the boundary coefficient value, for example maintaining at least one subset bounded by the boundary coefficient value and setting the remaining subsets of coefficient values to a zero value. The remaining subset(s) can then be decoded and displayed on a display, for example. This has been illustrated in detail with respect to FIGS. 9-11, for example.

7 Conclusions to TCM

Motivated by the flat tail phenomenon in DCT coefficients and its perceptual importance, this disclosure has developed a model dubbed transparent composite model (TCM) for modeling DCT coefficients, which separates the tail portion of DCT coefficients from the main portion of DCT coefficients and uses a different distribution to model each portion: a uniform distribution for the tail portion and a parametric distribution such as truncated Laplacian, generalized gaussian (GG), and geometric distributions for the mail portion. Efficient online algorithms with global convergence have been developed to compute the ML estimates of the parameters in the TCM. It has been shown that among the Laplacian model, GG model, GGTCM, and LPTCM, the GGTCM offers the best modeling accuracy for real-valued DCT coefficients at the cost of large extra complexity. On the other hand, for discrete DCT coefficients, tests over a wide variety of images based on both divergence distance and χ² test have shown that the GMTCM outperforms both the Laplacian and GG models in term of modeling accuracy in majority cases while having simplicity and practicality similar to those of the Laplacian model, thus making the GMTCM a desirable choice for modeling discrete DCT coefficients in real-world applications. In addition, it has been demonstrated that the tail portion identified by the GMTCM gives rise to an image called an outlier image, which, on one hand, achieves dramatic dimension reduction in comparison with the original image, and on the other hand preserves perceptually important unique global features of the original image. It has been further suggested that the applications of the TCM, in particular the LPTCM and GMTCM, include image and video coding, quantization design, entropy coding design, and image understanding and management (image similarity testing, scenic image blind detection, face detection, etc.).

Appendix A

In this appendix, we prove Lemma 1.

First note that g(t) can be rewritten as

${g(t)} = {K - 1 + \frac{1}{1 - {\mathbb{e}}^{- t}} - {\frac{K}{1 - {\mathbb{e}}^{{- K}\; t}}.}}$ Its derivative is equal to

$\begin{matrix} {{g^{\prime}(t)} = {\frac{- {\mathbb{e}}^{- t}}{\left( {1 - {\mathbb{e}}^{- t}} \right)^{2}} + \frac{K^{2}{\mathbb{e}}^{{- K}\; t}}{\left( {1 - {\mathbb{e}}^{{- K}\; t}} \right)^{2}}}} \\ {= {- {\frac{{\mathbb{e}}^{- t}}{\left( {1 - {\mathbb{e}}^{{- K}\; t}} \right)^{2}}\left\lbrack {\frac{\left( {1 - {\mathbb{e}}^{{- K}\; t}} \right)^{2}}{\left( {1 - {\mathbb{e}}^{- t}} \right)^{2}} - {K^{2}{\mathbb{e}}^{{- {({K - 1})}}t}}} \right\rbrack}}} \\ {= {- {\frac{{\mathbb{e}}^{- t}}{\left( {1 - {\mathbb{e}}^{{- K}\; t}} \right)^{2}}\left\lbrack {\left( {\sum\limits_{i = 0}^{K - 1}{\mathbb{e}}^{{- {\mathbb{i}}}\; t}} \right)^{2} - {K^{2}{\mathbb{e}}^{{- {({K - 1})}}t}}} \right\rbrack}}} \\ {= {{- {\frac{{\mathbb{e}}^{- t}}{\left( {1 - {\mathbb{e}}^{{- K}\; t}} \right)^{2}}\left\lbrack {\left( {\sum\limits_{i = 0}^{K - 1}{\mathbb{e}}^{{- {\mathbb{i}}}\; t}} \right) - {K\;{\mathbb{e}}^{{- {({K - 1})}}{t/2}}}} \right\rbrack}} \cdot {\left\lbrack {\left( {\sum\limits_{i = 0}^{K - 1}{\mathbb{e}}^{{- {\mathbb{i}}}\; t}} \right) + {K\;{\mathbb{e}}^{{- {({K - 1})}}{t/2}}}} \right\rbrack.}}} \end{matrix}$

It is not hard to verify that

$\left\lbrack {\left( {\sum\limits_{i = 0}^{K - 1}{\mathbb{e}}^{{- {\mathbb{i}}}\; t}} \right) - {K\;{\mathbb{e}}^{{- {({K - 1})}}{t/2}}}} \right\rbrack = {{\sum\limits_{i = 0}^{K_{L}}\left( {{\mathbb{e}}^{{- {\mathbb{i}}}\;{t/2}} - {\mathbb{e}}^{{- \frac{K - 1 - i}{2}}t}} \right)^{2}} > 0}$ whenever K>1, where

$K_{L} = {{{floor}\left( \frac{K}{2} \right)} - 1.}$ This, together with (41), implies that g¹(t)<0 for any t>0 whenever K>1. Hence g(t) is strictly decreasing over t∈(0, ∞).

Next we have

$\begin{matrix} {{\lim\limits_{t->0^{+}}{g(t)}} = {\lim\limits_{t->0^{+}}{\frac{{\mathbb{e}}^{- t}}{1 - {\mathbb{e}}^{{- K}\; t}}\left\lbrack {{\sum\limits_{i = 0}^{K - 1}{\mathbb{e}}^{{- {\mathbb{i}}}\; t}} - {K \cdot {\mathbb{e}}^{{- {({K - 1})}}t}}} \right\rbrack}}} \\ {= {\lim\limits_{t->0^{+}}\frac{{\sum\limits_{i = 0}^{K - 1}{\mathbb{e}}^{{- {\mathbb{i}}}\; t}} - {K \cdot {\mathbb{e}}^{{- {({K - 1})}}t}}}{1 - {\mathbb{e}}^{{- K}\; t}}}} \\ {= {\frac{1}{2}{\left( {K - 1} \right).}}} \end{matrix}$

Finally, the strict concavity of

$L\left( {K,\frac{q}{t}} \right)$ as a function of t to follows from (41) and the fact that

$\frac{\partial^{2}{L\left( {K,\frac{q}{t}} \right)}}{\partial t^{2}} = {{{N_{1}(K)}}{{g^{\prime}(t)}.}}$

This completes the proof of Lemma 1.

APPENDIX B

In this appendix, we prove Theorem 3.

First, arguments similar to those in the proof of Theorem 2 can be used to show that λ^((i)) is upper bounded by λ_(K), strictly increases, and converges to λ_(K) as i→∞. Therefore what remains is to show that the convergence is exponentially fast. To this end, let

${h(\lambda)}\overset{\Delta}{=}{\frac{{\mathbb{e}}^{{- q}/\lambda}}{1 - {\mathbb{e}}^{{- q}/\lambda}}.}$

In view of (38), it follows that

$\begin{matrix} {{h\left( \lambda^{({i + 1})} \right)} = {C + \frac{K}{{\mathbb{e}}^{{K\;{q/\lambda^{(i)}}} - 1}}}} \\ {= {C + {K\;{h\left( {\lambda^{(i)}/K} \right)}}}} \end{matrix}$ and  hence $\begin{matrix} {{{h\left( \lambda^{({i + 1})} \right)} - {h\left( \lambda^{(i)} \right)}} = {{K\;{h\left( {\lambda^{(i)}/K} \right)}} - {K\;{h\left( {\lambda^{({i - 1})}/K} \right)}}}} \\ {= {\frac{{K\;{h\left( {\lambda^{(i)}/K} \right)}} - {K\;{h\left( {\lambda^{({i - 1})}/K} \right)}}}{{h\left( \lambda^{(i)} \right)} - {h\left( \lambda^{({i - 1})} \right)}}\left\lbrack {{h\left( \lambda^{(i)} \right)} - {h\left( \lambda^{({i - 1})} \right)}} \right\rbrack}} \\ {\leq {\delta\left\lbrack {{h\left( \lambda^{(i)} \right)} - {h\left( \lambda^{({i - 1})} \right)}} \right\rbrack}} \end{matrix}$ where $\delta = {\sup{\left\{ {{\frac{{K\;{h\left( {\lambda/K} \right)}} - {K\;{h\left( {v/K} \right)}}}{{h(\lambda)} - {h(v)}}:\lambda},{v \in \left\lbrack {\lambda^{(0)},\lambda_{K}} \right\rbrack},{\lambda \neq v}} \right\}.}}$

In view of Lemma 1 and its proof (particularly (41)), it is not hard to verify that 0<δ<1. Therefore, as i→∞, h(λ^((i))) converges to h(λ_(K)) exponentially fast. Since the derivative of h(λ) is positive over λ∈[λ⁽⁰⁾,λ_(K)] and bounded away from 0, it follows that λ^((i)) also converges to λ_(K) exponentially fast. This competes the proof of Theorem 3.

8 Introduction to MTCM

The above example embodiments have shown that (1) for real-valued continuous AC coefficients, LPTCM offers a superior trade-off between modeling accuracy and complexity; and (2) for discrete (integer or quantized) DCT coefficients, which are mostly seen in real-world applications of DCT, GMTCM models AC coefficients more accurately than the Laplacian model and GG model in majority cases while having simplicity and practicality similar to those of the Laplacian model. When limited to AC coefficients at low frequencies, however, GMTCM only ties up with the GG model in terms of modeling accuracy. Since DCT coefficients at low frequencies are generally more important than those at high frequencies to human perception, it would be advantageous to further improve the modeling accuracy of LPTCM and GMTCM for low frequency DCT coefficients without sacrificing modeling simplicity and practicality.

In accordance with at least some example embodiments, we extend the concept of TCM by further separating the main portion of DCT coefficients into multiple sub-portions and modeling each sub-portion by a different parametric distribution (such as truncated Laplacian, GG, and geometric distributions). The resulting model is dubbed a multiple segment TCM (MTCM). In the case of general MTCMs based on truncated Laplacian and geometric distributions (referred to as MLTCM and MGTCM, respectively), a greedy algorithm is developed for determining a desired number of segments and for estimating the corresponding separation boundaries and other MTCM parameters. For bi-segment TCMs, an efficient online algorithm is further presented for computing the maximum likelihood (ML) estimate of their parameters. Experiments based on Kullback-Leibler (KL) divergence and χ² test show that (1) for real-valued continuous AC coefficients, the bi-segment TCM based on truncated Laplacian (BLTCM) models AC coefficients more accurately than LPTCM and the GG model while having simplicity and practicality similar to those of LPTCM and pure Laplacian; and (2) for discrete DCT coefficients, the bi-segment TCM based on truncated geometric distributions (BGTCM) significantly outperforms GMTCM and the GG model in terms of modeling accuracy, while having simplicity and practicality similar to those of GMTCM. Also shown is that the MGTCM derived by the greedy algorithm further improves the modeling accuracy over BGTCM at the cost of more parameters and slight increase in complexity. Therefore, BLTCM/MLTCM and BGTCM/MGTCM represent the state of the art in terms of modeling accuracy for continuous and discrete DCT coefficients (or similar type of data), respectively, which, together with their simplicity and practicality, makes them a desirable choice for modeling DCT coefficients (or similar type of data) in real-world image/video applications.

9 Review of TCM

In this section, we briefly review the concept of TCM for continuous DCT coefficients, as described in detail above.

Let f(y|θ) be a probability density function (pdf) with parameter θ∈Θ, where θ could be a vector, and Θ is the parameter space. Let F(y|θ) be the corresponding cumulative distribution function (cdt), i.e. F(y|θ)

∫_(−∞) ^(y) f(u|θ)du.

Equation numbers will re-start from (1) for convenience of reference.

Assume that f(y|θ) is symmetric in y with respect to the origin, and F(y|θ) is concave as a function of y in the region y≦0. It is easy to verify that Laplacian, Gaussian, and GG distributions all satisfy this assumption. The continuous TCM based on F(y|θ) is defined as

$\begin{matrix} {{p\left( {{y❘y_{c}},b,\theta} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} {\frac{b}{{2{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y❘\theta} \right)}} & {{{if}\mspace{14mu}{y}} < y_{c}} \\ \frac{1 - b}{2\left( {a - y_{c}} \right)} & {{{if}\mspace{14mu} y_{c}} < {y} \leq a} \\ {\max\left\{ {{\frac{b}{{2{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y_{c}❘\theta} \right)}},\frac{1 - b}{2\left( {a - y_{c}} \right)}} \right\}} & {{{if}\mspace{14mu}{y}} = y_{c}} \\ 0 & {otherwise} \end{matrix} \right.} & (42) \end{matrix}$ where 0≦b≦1, 0<d≦y_(c)<a, and a represents the largest magnitude a sample y can take. Here both a and d are assumed to be known. It is not hard to see that given (y_(c), b, θ), as a function of y, p(y|y_(c), b, θ) is indeed a pdf, and also symmetric with respect to the origin.

According to the TCM defined in (42), a sample y is generated according to the truncated distribution

$\frac{1}{{2\;{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y❘\theta} \right)}$ with probability b, and according to the uniform distribution

$\frac{1}{2\left( {a - y_{c}} \right)}$ (also called the outlier distribution) with probability 1−b. The composite model is transparent since given parameters (y_(c), b, θ), there is no ambiguity regarding which distribution a sample y≠±y_(c) comes from. The ML estimates of the separation boundary y_(c) and parameter (θ, b) can be computed efficiently through the online algorithm with global convergence developed in Sections 3 and 4 above, especially when f(y|74) is Laplacian. As described in Sections 3 to 6 above, the value of b is on average around 0.99. As such, the portions modeled by the truncated distribution

$\frac{1}{{2\;{F\left( {y_{c}❘\theta} \right)}} - 1}{f\left( {y❘\theta} \right)}$ and the outlier distribution are referred to as the main and tail portions, respectively.

10 Continuous Multiple Segment Transparent Composite Model

To improve the modeling accuracy of TCM, especially for AC coefficients at low frequencies, we now further separate the main portion of DCT coefficients into multiple sub-portions and model each sub-portion independently by a different parametric distribution, yielding a model we call a multiple segment transparent composite model. Assuming DCT coefficients are continuous (i.e. can take any real value), in this section we describe and analyze continuous MTCMs.

10.1 Description of General Continuous MTCMs

Separate the main portion further into l sub-portions. The MTCM based on F(y|θ) with l+1 segments is defined as

$\begin{matrix} {\;{{p\left( {{y❘{\overset{\_}{y}}_{c}},\overset{\_}{b},\overset{\_}{\theta}} \right)}\overset{\bigtriangleup}{=}\left\{ {{{\begin{matrix} {\frac{b_{1}}{{2\;{F\left( {y_{c_{1}}❘\theta_{1}} \right)}} - 1}{f\left( {y❘\theta_{1}} \right)}} & {{{if}\mspace{14mu}{y}} < y_{c_{1}}} \\ {\frac{b_{2}}{2\left\lbrack {{F\left( {y_{c_{2}}❘\theta_{2}} \right)} - {F\left( {y_{c_{1}}❘\theta_{2}} \right)}} \right\rbrack}{f\left( {y❘\theta_{2}} \right)}} & {{{if}\mspace{14mu} y_{c_{1}}} < {y} < y_{c_{2}}} \\ \vdots & \; \\ {{\frac{b_{l}}{2\left\lbrack {{F\left( {y_{c_{l}}❘\theta_{l}} \right)} - {F\left( {y_{c_{l - 1}}❘\theta_{l}} \right)}} \right\rbrack}{f\left( {y❘\theta_{l}} \right)}},} & {{{if}\mspace{14mu} y_{c_{l - 1}}} < {y} < y_{c_{i}}} \\ {{\frac{b_{l + 1}}{2\left\lbrack {{F\left( {a❘\theta_{l + 1}} \right)} - {F\left( {y_{c_{l}}❘\theta_{l + 1}} \right)}} \right\rbrack}{f\left( {y❘\theta_{l + 1}} \right)}},} & {{{if}\mspace{14mu} y_{c_{l}}} < {y} \leq a} \\ 0 & {otherwise} \end{matrix}\mspace{79mu}{where}\mspace{14mu}{\overset{\_}{y}}_{c}} = {{{\left( {y_{c_{1}},y_{c_{2}},,\ldots\mspace{14mu},y_{c_{l}}} \right)\mspace{14mu}{with}\mspace{79mu} 0} < d \leq y_{c_{1}} < y_{c_{2}} < \ldots < y_{c_{l}} < y_{c_{l + 1}}} = {{a\mspace{79mu}\overset{\_}{b}} = {{{\left( {b_{1},b_{2},\ldots\mspace{14mu},b_{l + 1}} \right)\mspace{14mu}{with}\;\mspace{79mu} b_{i}} \geq {{0\mspace{14mu}{and}\mspace{14mu} b_{1}} + b_{2} + \ldots + b_{l + 1}}} = 1}}}},\mspace{79mu}{\overset{\_}{\theta} = \left( {\theta_{1},\theta_{2},\ldots\mspace{14mu},\theta_{l + 1}} \right)},\mspace{79mu}{{{and}{p\left( {{y❘{\overset{\_}{y}}_{c}},\overset{\_}{b},\overset{\_}{\theta}} \right)}} = {\max\left\{ {{\frac{b_{1}}{2\left\lbrack {{F\left( {y_{c_{i}}❘\theta_{i}} \right)} - {F\left( {y_{c_{i - 1}}❘\theta_{i}} \right)}} \right\rbrack}{f\left( {y❘\theta_{i}} \right)}},{\frac{b_{i + 1}}{2\left\lbrack {{F\left( {y_{c_{i + 1}}❘\theta_{i + 1}} \right)} - {F\left( {y_{c_{i}}❘\theta_{i + 1}} \right)}} \right\rbrack}{f\left( {y❘\theta_{i + 1}} \right)}}} \right\}}}} \right.}} & (43) \end{matrix}$ whenever y=±|y_(c) _(i| for i=) 1, 2, . . . l with y_(c) _(0=0.)

Note that in the MTCM defined in (43), the tail portion is also modeled by a truncated distribution based on f(y|θ). This deviation from the TCM defined in (42) is motivated by the observation that given y _(c), the uniform distribution over (y_(c) _(l, a ]∪[−) a , −y_(c) _(l) ) is actually the limiting distribution of

$\frac{1}{2\left\lbrack {{F\left( {a❘\theta_{l + 1}} \right)} - {F\left( {y_{c_{l}}❘\theta_{l + 1}} \right)}} \right\rbrack}{f\left( {y❘\theta_{l + 1}} \right)}$ as some parameter in θ_(l+1) goes to ∞ for most parametric distributions f(y|θ) such as the Laplacian, Gaussian, and GG distributions. Therefore, leaving θ_(l+1) to be determined by ML estimation would improve modeling accuracy in general.

Depending on f(y|θ), estimating the MTCM parameters y _(c), b, θ in a general case for arbitrary l may be difficult. In the following, we shall instead focus on special cases where l=1 or f(y|θ) is Laplacian, and develop accordingly effective ways for estimating y _(c), b, θ.

10.2 ML Estimates of Bi-Segment TCM Parameters

In the case of bi-segment TCM, we have l=1 and the parameters to be estimated are y_(c) ₁ , b₁, θ₁, and θ₂. To develop an attractive algorithm for computing the ML estimates of y_(c) ₁ , b₁, θ₁, and θ₂, we further assume that f(y|θ) is differentiable for y≧0 and F″(y|θ)[1−F(y|θ)]+[F′(y|θ)]²≧0  (44) for any y≧0. It is not hard to verify that the Laplacian, Gaussian, and GG distributions with the shape parameter β≦1 all satisfy (44).

Let Y₁ ^(n)=(Y₁, Y₂, . . . , Y_(n)) be a sequence of DCT coefficients in an image or in a large coding unit (such as a block, a slice or a frame in video coding) at a particular frequency or across frequencies of interest. Assume that Y₁ ^(n) behaves according to the MTCM defined in (43) with l=1 and with Y_(max)

max{|Y_(i)|:1≦i≦n}<a and Y_(max)≦d. (When Y_(max)<d, the ML estimates of y_(c) _(1 and b) ₁ are equal to d and 1, respectively.) We next investigate how to compute the ML estimates of y_(c) ₁ , b₁, θ₁, and θ₂ under the condition (44).

Given Y₁ ^(n) with d≦Y_(max)<a, define N ₁(y _(c) ₁ )

{i: |Y _(i) |<y _(c) _(i) } N ₂(y _(c) ₁ )

{i:y _(c) ₁ <|Y _(i)|} N ₃(y _(c) ₁ )

{i: |Y _(i) |=y _(c) _(i) }.

Then the log-likelihood function g(y_(c) _(i) , b₁, θ|Y₁ ^(n)) according to (43) with l=1 is equal to

$\begin{matrix} {{g\left( {y_{c_{1}},b_{1},{\overset{\_}{\theta}❘Y_{1}^{n}}} \right)} = {{\sum\limits_{i \in {N_{1}{(y_{c_{1}})}}}^{\;}\;{\ln\;{f\left( {Y_{i}❘\theta_{1}} \right)}}} + {\sum\limits_{i \in {N_{2}{(y_{c_{1}})}}}^{\;}{\ln\;{f\left( {Y_{i}❘\theta_{2}} \right)}}} + {{{N_{1}\left( y_{c_{1}} \right)}}\ln\; b_{1}} + {{{N_{2}\left( y_{c_{1}} \right)}}{\ln\left( {1 - b_{1}} \right)}} + {{{N_{3}\left( y_{c_{1}} \right)}}\max\left\{ {{\ln\frac{b_{1}{f\left( {y_{c_{1}}❘\theta_{1}} \right)}}{{2\;{F\left( {y_{c_{1}}❘\theta_{1}} \right)}} - 1}},{\ln\frac{\left( {1 - b_{1}} \right){f\left( {y_{c_{1}}❘\theta_{2}} \right)}}{2\left\lbrack {{F\left( {a❘\theta_{2}} \right)} - {F\left( {y_{c_{1}}❘\theta_{2}} \right)}} \right\rbrack}}} \right\}} - {{{N_{1}\left( y_{c_{1}} \right)}}{\ln\left\lbrack {{2\;{F\left( {y_{c_{1}}❘\theta_{1}} \right)}} - 1} \right\rbrack}} - {{{N_{2}\left( y_{c_{1}} \right)}}\ln\;{2\left\lbrack {{F\left( {a❘\theta_{2}} \right)} - {F\left( {y_{c_{1}}❘\theta_{2}} \right)}} \right\rbrack}}}} & (45) \end{matrix}$ where |S| denotes the cardinality of a finite set S. In view of (44) and the assumption that F(y|θ) is concave, one can verify that given |N₁(y_(c) ₁ )| and |N₂(y_(c) ₁ )|, −|N ₁(y _(c) ₁ )| ln [2F(y _(c) ₁ |θ₁)−1]−|N ₂(y _(c) ₁ )| ln 2[F(a|θ ₂)−F(y _(c) ₁ |θ₂)] as a function of y_(c) _(1 is) convex. Sort |Y₁|, |Y₂|, . . . |Y_(n)| in ascending order into W₁≦W₂≦ . . . ≦W_(n). Note that W_(n)=Y_(max). Let m

min{i:W_(i)≧d}. Then using an argument similar to the proof of Theorem 1 in subsection 3.2 (above), one can show that

$\begin{matrix} {{\max\left\{ {{{{g\left( {y_{c_{1}},b_{1},{\overset{\_}{\theta}❘Y_{1}^{n}}} \right)}\text{:}\mspace{14mu} d} \leq y_{c_{1}} < a},{0 \leq b_{1} \leq 1},\overset{\_}{\theta}} \right\}} = {\max\limits_{{b_{1},\overset{\_}{\theta}}\;}\mspace{14mu}{\max\limits_{y_{c_{1}} \in {\{{d,W_{m},W_{m + 1},\ldots\mspace{14mu},W_{n}}\}}}{{g\left( {y_{c_{1}},b_{1},{\overset{\_}{\theta}❘Y_{1}^{n}}} \right)}.}}}} & (46) \end{matrix}$

Therefore, the ML estimate of y_(c) ₁ is equal to one of d, W_(m), W_(m+1), . . . , W_(n).

For any y_(c) _(1∈{d, W) _(m), W_(m+1), . . . , W_(n)}, let (b ₁(y _(c) _(i) ), θ(y_(c)1))

arg max_(b) ₁ _(, θ) g(y _(c) ₁ ,b ₁, θ|Y₁ ^(n)).

Given y_(c) ₁ , b₁(y_(c) ₁ ) and θ_(i)(y_(c) ₁ ), i=1,2, can be computed in a manner similar to Algorithm 1 (FIG. 12). In particular, when f(y|θ) is Laplacian, θ_(i)(y_(c) ₁ ), i=1, 2, can be computed by the exponentially fast convergent Algorithm 2 (FIG. 13). Then the ML estimates of y_(c) ₁ , b₁, θ₁, and θ₂ are determined as

$\begin{matrix} {{y_{c_{1}}^{*} = {{argmax}_{y_{c_{1}} \in {\{{d,W_{m},{W_{{m + 1},}\ldots}\mspace{14mu},W_{n}}\}}}{g\left( {y_{c_{1}},{b_{1}\left( y_{c_{1}} \right)},\left. {\overset{\_}{\theta}\left( y_{c_{1}} \right)} \middle| Y_{1}^{n} \right.} \right)}}}\mspace{20mu}{b_{1}^{*} = {b_{1}\left( y_{c_{1}}^{*} \right)}}\mspace{20mu}{{\theta_{i}^{*} = {\theta_{i}\left( y_{c_{1}}^{*} \right)}},{i = 1},2.}} & (47) \end{matrix}$

Summarizing the above derivations into Algorithm 5 (FIG. 31) for computing (y_(c) ₁ *, b₁*, θ*), we have proved the following result.

Theorem 4: The vector (y_(c) ₁ *, b₁*, θ*) computed by Algorithm 5 is indeed the ML estimate of (y_(c) ₁ , b₁, θ) in the bi-segment TCM specified in (43) with l=1.

Remark 5: When f(y|θ) is Laplacian, the distribution of the tail in the bi-segment TCM specified in (43) with l=1 approaches the uniform distribution over (y_(c) ₁ , a]∪[−a, −y_(c) ₁ ) as θ₂ goes to ∞. Therefore, the BLTCM derived by Algorithm 5 in conjunction with Algorithm 2 (FIG. 13) is better than the LPTCM derived by Algorithm 1 (FIG. 12) in conjunction with Algorithm 2 (FIG. 13) in term of modeling accuracy. This is further confirmed by experiments in Section 12, below.

10.3 Estimates of MLTCM Parameters

Suppose now that f (y|θ) is Laplacian. Plugging the Laplacian density function

$\frac{1}{2\lambda}{\mathbb{e}}^{- {({{y}/\lambda})}}$ into (43), we get the MLTCM with l+1 segments given by

$\begin{matrix} {{p\left( {{y❘{\overset{\_}{y}}_{c}},\overset{\_}{b},\overset{\_}{\theta}} \right)}\overset{\bigtriangleup}{=}\left\{ \begin{matrix} {\frac{b_{1}}{1 - {\mathbb{e}}^{- y_{c_{1}/\lambda_{1}}}}\frac{1}{2\;\lambda_{1}}{\mathbb{e}}^{- \frac{y}{\lambda_{1}}}} & {{{if}\mspace{14mu}{y}} < y_{c_{1}}} \\ {\frac{b_{2}}{1 - {\mathbb{e}}^{{- {({y_{c_{2}} - y_{c_{1}}})}}/\lambda_{2}}}\frac{1}{2\;\lambda_{2}}{\mathbb{e}}^{- \frac{{y} - y_{c_{1}}}{\lambda_{2}}}} & {{{if}\mspace{14mu} y_{c_{1}}} < {y} < y_{c_{2}}} \\ \vdots & \; \\ {\frac{b_{l + 1}}{1 - {\mathbb{e}}^{{- {({y_{c_{l + 1}} - y_{c_{l}}})}}/\lambda_{l + 1}}}\frac{1}{2\;\lambda_{l + 1}}{\mathbb{e}}^{- \frac{{y} - y_{c_{l}}}{\lambda_{l + 1}}}} & {{{if}\mspace{14mu} y_{c_{l}}} < {y} \leq a} \\ 0 & {{otherwise}.} \end{matrix} \right.} & (48) \end{matrix}$ where y _(c)=(y_(c) ₁ , y_(c) ₂ , . . ., y_(c) _(l) ) with 0<y _(c) ₁ <y _(c) ₂ < . . . <y _(c) _(l) <y _(c) _(l+1) =a b=(b₁, b₂, . . . b_(l+1)) with b_(i)≧0 and b₁+b₂+ . . . +b_(l)+b_(l+1)=1, λ=(λ₁, λ₂, . . . λ_(l+1)) with λ_(i)≧0, i=1, 2, . . ., l+1, and

${p\left( {{y❘{\overset{\_}{y}}_{c}},\overset{\_}{b},\overset{\_}{\lambda}} \right)} = {\max\left\{ {{\frac{b_{i}}{1 - {\mathbb{e}}^{{- {({y_{c_{i}} - y_{c_{i - 1}}})}}/\lambda_{i}}}{\mathbb{e}}^{{- {({y_{c_{i}} - y_{c_{i - 1}}})}}/\lambda_{i}}},{\frac{b_{i + 1}}{1 - {\mathbb{e}}^{{- {({y_{c_{i + 1}} - y_{c_{i}}})}}/\lambda_{i + 1}}}\frac{1}{2\;\lambda_{i + 1}}}} \right\}}$ whenever |y|=y_(c) _(i) , i=1, 2, . . ., l, with y_(c) _(o=) 0.

In practice, neither l nor ( y _(c), b, λ) is known. Given Y₁ ^(n) with 0<K_(max)<a, we next present a greedy algorithm for determining a desired value of/and for estimating the corresponding parameters y _(c), b, λ. To this end, let us first consider a generic truncated Laplacian distribution

$\begin{matrix} {{p(y)} = \left\{ \begin{matrix} {\frac{1}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}\frac{1}{2\;\lambda}{\mathbb{e}}^{- {({{y}/\lambda})}}} & {{{if}\mspace{14mu}{y}} < y_{c}} \\ 0 & {{otherwise}.} \end{matrix} \right.} & (49) \end{matrix}$

Let V^(T)=(V₁, V₂, . . . , V_(T)) be a sequence of samples drawn independently according to the generic truncated Laplacian distribution given in (49). From the proof of Theorem 2, the ML estimate λ(V^(T)) of λ from V^(T) is the unique solution to

$\begin{matrix} {{{\lambda - \frac{y_{c}{\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} - C} = 0}{where}{C = {\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}}}} & (50) \end{matrix}$ and by convention, the solution to (50) is equal to 0 if C≦0, and +∞ if C≧y_(c)/2. From the central limit theorem and strong law of large numbers, we have

$\mspace{20mu}\left. \frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} - \left\lbrack {\lambda - \frac{y_{c}{\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}} \right\rbrack}{\sqrt{\sigma^{2}/T}}\rightarrow{\left( {0,1} \right)\mspace{14mu}{in}\mspace{14mu}{distribution}} \right.$   and $\mspace{20mu}\left. {{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}^{2}}} - \left\lbrack {\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} \right\rbrack^{2}}\rightarrow{\sigma^{2}\mspace{14mu}{with}\mspace{14mu}{probability}\mspace{14mu} 1} \right.$   where ${\sigma^{2} = {{{E{V_{1}}^{2}} - \left\lbrack {E{V_{1}}} \right\rbrack^{2}} = {{\frac{1}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}{\int_{0}^{y_{c}}{y^{2}\frac{1}{\lambda}{\mathbb{e}}^{{- y}/\lambda}\ {\mathbb{d}y}}}} - {\left\lbrack {\lambda - \frac{y_{c}{\mathbb{e}}^{y_{c}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}} \right\rbrack^{2}.}}}}\;$

Therefore, we have

$\left. \frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} - \left\lbrack {\lambda - \frac{y_{c}{\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}} \right\rbrack}{\sqrt{\frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}^{2}}} - \left\lbrack {\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} \right\rbrack^{2}}{T - 1}}}\rightarrow\left. {\left( {0,1} \right)\mspace{14mu}{in}\mspace{14mu}{distribution}\mspace{14mu}{as}\mspace{14mu} T}\rightarrow{\infty.} \right. \right.$

In particular,

$\begin{matrix} {{{{\lim\limits_{T\rightarrow\infty}\mspace{11mu}{\Pr\left\{ {{\frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} - \left\lbrack {\lambda - \frac{y_{c}{\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}} \right\rbrack}{\sqrt{\frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}^{2}}} - \left\lbrack {\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} \right\rbrack^{2}}{T - 1}}}} \leq {Q^{- 1}\left( \frac{\alpha}{2} \right)}} \right\}}} = {1 - \alpha}}{{for}\mspace{14mu}{any}\mspace{14mu} 0} < \alpha < 1},{{{where}\mspace{14mu}{Q(x)}}\overset{\bigtriangleup}{=}{\int_{x}^{+ \infty}{\frac{1}{\sqrt{2\;\pi}}{\mathbb{e}}^{{- u^{2}}/2}\ {{\mathbb{d}u}.}}}}} & (51) \end{matrix}$

Let λ⁺(V^(T)) be the unique solution to

$\begin{matrix} {{{\lambda - \frac{y_{c}{\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} - \mspace{529mu}\mspace{104mu}\left\lbrack {{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} + {{Q^{- 1}\left( \frac{\alpha}{2} \right)}\sqrt{\frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}^{2}}} - \left\lbrack {\frac{1}{T}{\sum\limits_{i = 1}^{T}\;{V_{i}}}} \right\rbrack^{2}}{T - 1}}}} \right\rbrack} = 0}\;} & (52) \end{matrix}$ and λ⁻(V^(T)) be the unique solution to

$\begin{matrix} {{{\lambda - \frac{y_{c}{\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}} - \mspace{110mu}\left\lbrack {{\frac{1}{T}{\sum\limits_{i = 1}^{T}{V_{i}}}} - {Q^{{- 1}{(\frac{\alpha}{2})}}\sqrt{\frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}{V_{i}}^{2}}} - \left\lbrack {\frac{1}{T}{\sum\limits_{i = 1}^{T}{V_{i}}}} \right\rbrack^{2}}{T - 1}}}} \right\rbrack} = 0.}\mspace{79mu}{Since}\text{}\mspace{79mu}{{s(\lambda)}\overset{\Delta}{=}{\lambda - \frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}}}} & (53) \end{matrix}$ as a function of λ is strictly increasing over λ>0, it follows from (51), (52), and (53) that (51) is equivalent to

$\begin{matrix} {{\lim\limits_{T\rightarrow\infty}{\Pr\left\{ {{\lambda^{-}\left( V^{T} \right)} \leq \lambda \leq {\lambda^{+}\left( V^{T} \right)}} \right\}}} = {1 - {\alpha.}}} & (54) \end{matrix}$

In other words, [λ⁻(V^(T)),λ⁺(V^(T))] is a confidence interval for estimating λ with asymptotic confidence level 1−α.

The above derivation provides a theoretic basis for us to develop a greedy method for determining l and for estimating y _(c), b, and λ from Y₁ ^(n)=(Y₁, Y₂, . . . , Y_(n)). Select a desired confidence level 1−α such as the one with α=0.05 or 0.02. In view of (54), also select a threshold T*>0 such that for any T≧T*, Pr{λ ⁻(V ^(T))≦λ≦λ⁺(V ^(T))} can be well approximated by 1−α. As before, sort |Y₁|, |Y₂|, . . . , |Y_(n)| in ascending order into W₁≦W₂≦ . . . ≦W_(n). Then let W=(W ₁ ,W ₂ , . . . ,W _(n)) and write (W_(i), W_(i+1), . . . , W_(j)), for any 1≦i≦j≦n, as W_(i) ^(j), and (W₁, W₂, . . . , W_(j)) simply as W^(j). Pick a T such that T≧T*, W_(T)>0, and W_(T+1)>W_(T). Let ΔT

|{i:W _(i) =W _(T+1)}|.

Compute λ⁺(W^(T)) and λ⁻(W^(T)) as in (52) and (53) respectively with y_(c)=W_(T), by replacing V_(i) by W_(i). Compute λ(W^(T+ΔT)) as in (50) with y_(c)=W_(T+ΔT) by replacing V_(i) by W_(i). In view of the derivations from (50) to (54), W^(T) and W_(T+)1^(T+ΔT) would be deemed to come from the same Laplacian model if λ(W^(T+ΔT))∈[λ⁻(W^(T)),λ⁺(W^(T))], and from different models otherwise. Using this criterion, one can then grow each segment recursively by padding each sample immediately adjacent to that segment into that segment until that sample and that segment are deemed to come from different models. This is the underlying idea behind the greedy method described as Algorithm 6 (FIG. 32) for estimating y_(c) ₁ , b₁, λ₁ from Y₁ ^(n) or equivalently W. The resulting estimates of y_(c) ₁ , b₁, λ₁ are denoted by Y_(c)(W), B(W), and Λ(W), respectively, for convenience.

Denote the value of T at the end of Algorithm 6 in response to W as T(W). After the first segment with length T₁=T(W) is identified and the values of y_(c) ₁ , b₁, and λ₁ are determined as y_(c) ₁ =Y_(c)(W), b₁=B(W), and λ₁=Λ(W), Algorithm 6 can be applied again to the translated remaining samples (W _(T) ₁₊₁ −y _(c) ₁ ,W _(T) ₁ ₊₂ −y _(c) ₁ , . . . ,W _(n) −y _(c) ₁ ) with y_(s)=y_(c) ₁ to determine the length T₂ of the second segment and the values of y_(c) _(2, b) ₂, and λ₂: T ₂ =T(W _(T) ₁ ₊₁ −y _(c) ₁ ,W _(T) ₁ ₊₂ −y _(c) ₁ , . . . ,W _(n) −y _(c) ₁ ) y _(c) ₂ =y _(c) ₁ +Y _(c)(W _(T) ₁ ₊₁ −y _(c) ₁ ,W _(T) ₁ ₊₂ −y _(c) ₁ , . . . ,W _(n) −y _(c) ₁ ) b ₂ =B(W _(T) ₁ ₊₁ −y _(c) ₁ ,W _(T) ₁ ₊₂ −y _(c) ₁ , . . . ,W _(n) −y _(c) ₁ ) λ₂=Λ(W _(T) ₁ ₊₁ −y _(c) ₁ ,W _(T) ₁ ₊₂ −y _(c) ₁ , . . . ,W _(n) −y _(c) ₁ ).

This procedure can be repeated again and again until there are no more remaining samples, yielding a greedy method described as Algorithm 7 (FIG. 33) for determining l and for estimating y _(c), b, and λ from Y₁ ^(n)=(Y₁, Y₂, . . . , Y_(n)), where the vector U denotes the dynamic translated remaining samples, and t denotes the cumulative length of all segments identified so far.

Let J denote the value of j at the end of Algorithm 7 in response to Y₁ ^(n). Then y_(c) _(j−1) is equal to a or Y_(max). In the case of y_(c) _(j−1) =a, the value of l in the MLTCM given by (48) is equal to J−2. Otherwise, l is equal to J−1, and the last segment is from Y_(max) to a with b_(l+1)=0 and λ_(l+1) defined arbitrary. Since solutions to (50), (52), and (53) can be computed effectively by the exponentially fast convergent Algorithm 2 (FIG. 13), Algorithm 7 is very efficient and runs essentially in linear time once the sorting of Y₁ ^(n) into W is done. Experiments in Section 12 show that the resulting MLTCM has superior modeling accuracy.

11 Discrete Multiple Segment Transparent Composite Model

Though DCT in theory provides a mapping from a real-valued space to another real-valued space and generates continuous DCT coefficients, in practice (particularly in lossy image and video coding), DCT is often designed and implemented as a mapping from an integer-valued space (e.g., 8-bits pixels) to another integer-valued space and gives rise to integer DCT coefficients (e.g., 12-bits DCT coefficients in H.264). In addition, since most images and video are stored in a compressed format such as JPEG, H.264, etc., for applications (e.g., image enhancement, image retrieval, image annotation, etc.) based on compressed images and video, DCT coefficients are available only in their quantized values. Therefore, it is desirable to further improve the modeling accuracy of GMTCM for discrete (integer or quantized) DCT coefficients in practice by considering the discrete counterpart of continuous MTCMs, i.e., discrete MTCMs.

The particular discrete MTCM we shall consider and analyze in this section is the one where each segment is modeled by a truncated geometric distribution. The resulting discrete MTCM is broadly referred to as MGTCM in general and as BGTCM in the special case of two segments. To provide a unified treatment for both integer and quantized DCT coefficients, we introduce a quantization factor of step size. Then both integer and quantized DCT coefficients can be regarded as integers multiplied by a properly chosen step size.

11.1 MGTCM

Consider uniform quantization with dead zone, which is widely used in image and video coding (see, for example, H.264 and HEVC). Mathematically, the output of the uniform quantizer with dead zone Δ and step size q in response to an input X is given by

${Q(X)} = {q \times {{sign}(X)} \times {{round}\left( \frac{{}X{{- \left( {\Delta - {q/2}} \right)}}}{q} \right)}}$ where q/2≦Δ<q. Assume that the input X is distributed according to the Laplacian distribution. Then the quantized index

${sign}(X) \times {{round}\left( \frac{{}X{{- \left( {\Delta - {q/2}} \right)}}}{q} \right)}$ is distributed as follows

$\begin{matrix} {{{p_{0} = {1 - {\mathbb{e}}^{- \frac{\Delta}{\lambda}}}}p_{i} = {\frac{1}{2}{{\mathbb{e}}^{- \frac{\Delta}{\lambda}}\left\lbrack {1 - {\mathbb{e}}^{- \frac{q}{\lambda}}} \right\rbrack}{\mathbb{e}}^{{- \frac{q}{\lambda}}{({{i} - 1})}}}},{i = {\pm 1}},{\pm 2},} & (55) \end{matrix}$

With the help of q, discrete (integer or quantized) DCT coefficients then take values of integers multiplied by q. (Hereafter, these integers will be referred to as DCT indices.) Note that p_(i) in (55) is essentially a geometric distribution. Using a geometric distribution to model each segment, we then get the MGTCM with l+1 segments given by

$\begin{matrix} {{p_{i}\left( {\overset{\_}{K},\overset{\_}{\lambda},\overset{\_}{b}} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} b_{0} & {{{if}\mspace{14mu} i} = 0} \\ {\frac{b_{1}}{2}{\mathbb{e}}^{{- \frac{q}{\lambda_{1}}}{({{i} - 1})}}\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda_{1}}}}{1 - {\mathbb{e}}^{{- \frac{q}{\lambda_{1}}}K_{1}}}} & {{{if}\mspace{14mu} 0} < {i} \leq K_{1}} \\ {\frac{b_{2}}{2}{\mathbb{e}}^{{- \frac{q}{\lambda_{2}}}{({{i} - K_{1} - 1})}}\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda_{2}}}}{1 - {\mathbb{e}}^{- {\frac{q}{\lambda_{2}}{\lbrack{K_{2} - K_{1}}\rbrack}}}}} & {{{if}\mspace{14mu} K_{1}} < {i} \leq K_{2}} \\ \vdots & \; \\ {\frac{b_{l + 1}}{2}{\mathbb{e}}^{{- \frac{q}{\lambda_{l + 1}}}{({{i} - K_{l} - 1})}}\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda_{l + 1}}}}{1 - {\mathbb{e}}^{- {\frac{q}{\lambda_{l + 1}}{\lbrack{a - K_{l}}\rbrack}}}}} & {{{if}\mspace{14mu} K_{l}} < {i} \leq a} \\ 0 & {otherwise} \end{matrix} \right.} & (56) \\ {\mspace{79mu}{{{{where}\mspace{11mu}\overset{\_}{K}} = \left( {K_{1},\ldots\mspace{14mu},K_{l}} \right)}\mspace{79mu}{with}\mspace{79mu}{K_{0} = {{0 < K_{1} < K_{2} < \ldots < K_{l} < K_{l + 1}} = a}}}} & \; \end{matrix}$ λ=(λ₁, . . . λ_(l), λ_(l+1)) with λ_(i)≦0, b=(b₀, b₁, . . . b_(l+1)) with b_(i)≧0 and b₀+b₁+ . . . +b_(l+1)=1, and a is the largest index in a given sequence of DCT indices. Here a is assumed known, and K, λ, and b are model parameters.

11.2 ML Estimates of MGTCM Parameters

Let u^(n)=u₁, u₂, . . . , u_(n) be a sequence of DCT indices. Assume that u^(n) behaves according to the MGTCM defined by (56) with u_(max)

max{|u_(i)|: 1≦i≦n}≦a. When the number l+1 of segments is given, the parameters K, λ, and b can be estimated via ML estimation from u^(n).

Let N₀={j:u_(j)=0}. For any 1≦i≦l+1, let

${N_{i}\left( \overset{\_}{K} \right)} = \left\{ {{j\text{:}\mspace{14mu} K_{i - 1}} < {u_{j}} \leq K_{i}} \right\}$ and  define ${{L_{i}\left( {\overset{\_}{K},\lambda} \right)}\overset{\Delta}{=}{{N_{i}\left( \overset{\_}{K} \right)}}\ln\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda}}}{1 - {\mathbb{e}}^{- {\frac{q}{\lambda}{\lbrack{K_{i} - K_{i - 1}}\rbrack}}}}} - {\frac{q}{\lambda}{\sum\limits_{j \in {N_{i}{(\overset{\_}{K})}}}^{\;}{\left( {{u_{j}} - K_{i - 1} - 1} \right).}}}$

Given l, the log-likelihood function of u^(n) according to (56) is then equal to

$\begin{matrix} {{{G\left( {\overset{\_}{K},\overset{\_}{\lambda},\overset{\_}{b}} \right)}\overset{\Delta}{=}{N_{0}}\ln\; b_{0}} + {\sum\limits_{i = 1}^{l + 1}{{{N_{i}\left( \overset{\_}{K} \right)}}\ln\;\frac{b_{i}}{2}}} + {\sum\limits_{i = 1}^{l + 1}{{L_{i}\left( {\overset{\_}{K},\lambda_{i}} \right)}.}}} & (57) \end{matrix}$

Given K, further define ( λ( K ), b ( K ))

arg max _(λ, b,) G( K , λ, b).

Then it follows from (57) that

${{b_{0}\left( \overset{\_}{K} \right)} = \frac{N_{0}}{n}},$ and for any 1≦i≦l+1,

$\begin{matrix} {{{b_{i}\left( \overset{\_}{K} \right)} = \frac{{N_{i}\left( \overset{\_}{K} \right)}}{n}}{and}{{\lambda_{i}\left( \overset{\_}{K} \right)} = {\arg\;{\max_{0 \leq \lambda \leq \infty}{{L_{i}\left( {\overset{\_}{K},\lambda} \right)}.}}}}} & (58) \end{matrix}$

Finally, the ML estimate of K is equal to K*=arg max _(K) G( K , λ( K ), b ( K )).  (59)

Accordingly, the ML estimates of λ and b are respectively equal to λ_(i)*=λ_(i)( K *) and b _(i) *=b _(i)( K *) for any 1≦i≦l+1 with

$b_{0}^{*} = {\frac{N_{0}}{n}.}$

Given K, λ_(i)( K) in (58) can be computed effectively by the exponentially fast convergent Algorithm 3 (FIG. 14); in particular, as shown therein, λ_(i)( K) is the unique solution to

$\begin{matrix} {{{\frac{1}{{\mathbb{e}}^{q/\lambda} - 1} - \frac{K_{i} - K_{i - 1}}{{\mathbb{e}}^{{\lbrack{K_{i} - K_{i - 1}}\rbrack}{q/\lambda}} - 1} - C} = 0}{where}{C = {\frac{1}{{N_{i}\left( \overset{\_}{K} \right)}}{\sum\limits_{j \in {N_{i}{(\overset{\_}{K})}}}^{\;}\left( {{u_{j}} - K_{i - 1} - 1} \right)}}}} & (60) \end{matrix}$ and by convention, the solution to (60) is equal to 0 if C≦0, and +∞ if C≧[K_(i)−K_(i−1)−1]/2. Therefore, the complexity of computing K*, λ*, and b* lies mainly in comparing all possible combinations of K to K*, the complexity of which is O(a^(l)). In the case of l=1, the complexity is essentially the same as that of computing the ML estimates of GMTCM parameters. In addition, since the distribution of the tail in the MGTCM defined by (56) converges to the uniform distribution over (K_(l), a]∪[−a, −K_(l)) as λ_(l+1) goes to ∞, BGTCM offers better modeling accuracy than does GMTCM, which is further confirmed by experiments in Section 12.

11.3 Greedy Estimation of l and Other MGTCM Parameters

When the number l+1 of segments in the MGTCM defined by (56) is unknown, it has to be estimated as well along with other parameters K, λ, and b. In this subsection, we present a greedy algorithm for determining a desired value of l and estimating the corresponding parameters K, λ, and b. The algorithm is similar to Algorithms 6 and 7 in principle. As such, we shall point out only places where modifications are needed.

Consider a generic truncated geometric distribution

$\begin{matrix} {{p_{i}\left( {K,\lambda} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} {\frac{1}{2}{\mathbb{e}}^{{- \frac{q}{\lambda}}{({{i} - 1})}}\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda}}}{1 - {\mathbb{e}}^{{- \frac{q}{\lambda}}K}}} & {{{if}\mspace{14mu} 0} < {i} \leq K} \\ 0 & {{otherwise}.} \end{matrix}\; \right.} & (61) \end{matrix}$

Let V^(T)=(V₁, V₂, . . . , V_(T)) be a sequence of samples drawn independently according to the generic truncated geometric distribution given by (61). As shown in Algorithm 3 (FIG. 14), the ML estimate λ(V^(T)) of λ from V^(T) is the unique solution to

$\begin{matrix} {{{\frac{1}{{\mathbb{e}}^{q/\lambda} - 1} - \frac{K}{{\mathbb{e}}^{{Kq}/\lambda} - 1} - C} = 0}{where}{C = {\frac{1}{T}{\sum\limits_{i = 1}^{T}\left( {{V_{i}} - 1} \right)}}}} & (62) \end{matrix}$ and by convention, the solution to (62) is equal to 0 if C≦0, and +∞ if C≧(K−1)/2. In parallel with (52) and (53), let λ⁺(V^(T)) and λ⁻(V^(T)) be respectively the unique solution to

$\begin{matrix} {{{\frac{1}{{\mathbb{e}}^{q/\lambda} - 1} - \frac{K}{{\mathbb{e}}^{{Kq}/\lambda} - 1} - \left\lbrack {C + {{Q^{- 1}\left( \frac{\alpha}{2} \right)}\sqrt{\frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}{V_{i}}^{2}}} - \left\lbrack {\frac{1}{T}{\sum\limits_{i = 1}^{T}{V_{i}}}} \right\rbrack^{2}}{T - 1}}}} \right\rbrack} = 0}\mspace{79mu}{and}} & (63) \\ {{\frac{1}{{\mathbb{e}}^{q/\lambda} - 1} - \frac{K}{{\mathbb{e}}^{{Kq}/\lambda} - 1} - \left\lbrack {C - {{Q^{- 1}\left( \frac{\alpha}{2} \right)}\sqrt{\frac{{\frac{1}{T}{\sum\limits_{i = 1}^{T}{V_{i}}^{2}}} - \left\lbrack {\frac{1}{T}{\sum\limits_{i = 1}^{T}{V_{i}}}} \right\rbrack^{2}}{T - 1}}}} \right\rbrack} = 0.} & (64) \end{matrix}$ Then (54) remains valid. Note that b₀=|N₀|/n. Let {circumflex over (n)}=n—|N₀|. Sort |u_(i)|, i∉N₀, in ascending order into W₁≦W₂≦ . . . ≦W_({circumflex over (n)}, and let) W=(W ₁ ,W ₂ , . . . ,W _({circumflex over (n)})).

Then a greedy algorithm similar to Algorithm 6 can be used to estimate K₁, b₁, and λ₁, from W, which is described in detail in Algorithm 8 (FIG. 34). The resulting estimates of K₁, b₁, and λ₁ from W are denoted by K(W), B(W), and Λ(W), respectively, for convenience.

Denote the value of T at the end of Algorithm 8 in response to W as T(W). Applying repeatedly Algorithm 8 to translated remaining samples until there are no more remaining samples, we get a greedy method described as Algorithm 9 (FIG. 35) for determining l and for estimating K, b, and λ from u^(n), where the vector U denotes the dynamic translated remaining samples, and t denotes the cumulative length of all segments identified so far.

In practical implementation of Algorithms 8 and 9, the step of sorting u_(i) could be avoided. Instead, one can equivalently collect the data histogram {h_(j), j=0,1, . . . , a} from u^(n). Since solutions to (62) to (64) can be effectively computed by the exponentially fast convergent Algorithm 3 (FIG. 14), the major complexity of Algorithms 8 and 9 lies essentially in collecting the data histogram {h_(j), j=0, 1, . . . , a}. Experiments in Section 12 show that the MGTCM derived by Algorithms 8 and 9 has superior modeling accuracy, where the number of segments is on average around 7.

12 Experimental Results on Tests of Modeling Accuracy

This section presents experimental results obtained from applying BLTCM to model continuous DCT coefficients with comparison to GG and LPTCM in non-multiple TCM as described above, and applying MGTCM and BGTCM to model DCT indices with comparison to GG, Laplacian and GMTCM in non-multiple TCM as described above. As DCT coefficients in real world application are often in their quantized values or take integer values (e.g., an integer approximation of DCT is used in H264 and HEVC), this section is mostly focused on the modeling performance of the discrete models MGTCM and BGTCM.

12.1 Tests of Modeling Accuracy

Two criteria are applied again in this section to test the modeling accuracy of the developed models and to compare them with other models in the literature. Again, the first one is the χ², defined as follows,

$\begin{matrix} {{\chi^{2} = {\sum\limits_{i = 1}^{I}\frac{\left( {n_{i} - {n \cdot q_{i}}} \right)^{2}}{n \cdot q_{i}}}},} & (65) \end{matrix}$ where l is the number of intervals into which the sample space is partitioned, n is the total number of samples, n_(i) denotes the number of samples falling into the ith interval, and q_(i) is the estimated probability by the underlying theoretical model that a sample falls into the interval i. Another criterion is the Kullback-Leibler (KL) divergence distance, which is defined as

$\begin{matrix} {{d = {\sum\limits_{i = 1}^{I}{p_{i}\ln\frac{p_{i}}{q_{i}}}}},} & (66) \end{matrix}$ where l is the alphabet size of a discrete source, p_(i) represents probabilities observed from the data, and q_(i) stands for probabilities obtained from a given model. Note that p_(i)=0 is dealt with by defining 0 ln 0=0.

When a comparison is conducted, a factor w_(d) is calculated to be the percentage of DCT frequencies among all tested AC positions that are in favor of one model over another model in terms of having a smaller KL divergence from the data distribution. Another factor w_(χ) ₂ is defined in a similar way, except that the comparison is carried out based on the χ² test results for individual frequencies.

To illustrate the improvement of BGTCM over GMTCM for modeling low frequency DCT coefficients, experimental results are collected for low frequency DCT coefficients. Specifically, a zig-zag scan is performed and only the first 15 ACs along the scanning order are used for testing the modeling accuracy.

Three sets of testing images are deliberately selected to cover a variety of image content, as what have been used in Section 5 described above. The first set, as shown in FIG. 6, includes 9 standard 512×512 images with faces, animals, buildings, landscapes, etc. The second set, as shown in FIG. 7, has five high definition (1920×1080) frames from the class-B sequences used for HEVC standardization tests [3]. The third set, as shown in FIG. 8, is also taken from HEVC test sequences, as the Class F sequences for screen content, i.e., frames that are captured from computer screens.

12.2 Overall Comparisons for Each Image

For modeling continuous DCT coefficients, experiments have been conducted to do overall comparison among BLTCM, LPTCM, and GG model. For modeling DCT indices, comparative experiments have been conducted for two pairs of models. The first is to compare BGTCM and the GG model. The second comparison is between BGTCM and MGTCM. For the overall comparison between other pairs of models, the result can be seen without experimental data. For example, BGTCM always outperforms GMTCM and GMTCM always has a better modeling accuracy than the Laplacian model.

Table 10 (FIG. 36) shows the percentage w_(χ) ₂ (w_(d), respectively) of frequencies among the 15 low AC positions that are in favor of BLTCM over the GG model for each of 9 images in Set 1 in terms of the χ² metric (KL divergence, respectively). To illustrate the improvement of BLTCM over LPTCM, Table 11 (FIG. 37) shows the percentage w_(χ) ₂ (w_(d), respectively) of frequencies among the 15 low AC positions that are in favor of LPTCM over the GG model for each of 9 images in Set 1 in terms of the χ² metric (KL divergence, respectively). From Tables 10 and 11, it is clear that BLTCM significantly improves LPTCM and outperforms the GG model in terms of modeling accuracy for modeling continuous low frequency DCT coefficients.

Table 12 (FIG. 38) shows the percentage w_(χ) ₂ (w_(d), respectively) of frequencies among the 15 low AC positions that are in favor of BGTCM over the GG model for each of images in all test sets in terms of the χ² metric (KL divergence, respectively). Note that all images are coded by JPEG with QF=100 and the discrete/quantized DCT coefficients are read directly from the JPEG file. From Table 12, it is clear that BGTCM provides significantly better modeling accuracy overall than the GG model for these low-frequency DCT coefficients. To illustrate the improvement of BGTCM over GMTCM, Table 13 (FIG. 39) presents the overall comparative results between GMTCM and the GG model, which shows a fairly tied performance between GMTCM and the GG model for modeling the low frequencies. In comparison of Table 12 with Table 13. It is clear that BGTCM achieves its goal of improving the accuracy for modeling low frequency DCT coefficients while having its simplicity and practicality similar to those of GMTCM.

Table 14 (FIG. 40) presents the percentage w_(χ) ₂ (w_(d), respectively) of frequencies among the 15 low AC positions that are in favor of the MGTCM derived from Algorithms 8 and 9 over BGTCM for each of images in all test sets in terms of the χ² metric (KL divergence, respectively). Note that since the greedy algorithm was used, the modeling accuracy of the resulting MGTCM is not always guaranteed to be superior over that of BGTCM, although from the model establishment point of view, BGTCM is considered as a special case of MGTCM. Nevertheless, Table 14 shows that the greedy algorithm for MGTCM works fairly well and the resulting MGTCM generally provides better modeling accuracy than BGTCM. This is also true for the comparison between the MLTCM derived from Algorithms 2 and 3 and BLTCM, the detail of which is hence omitted here.

12.3 Comparisons of Modeling Accuracy for Individual Frequencies

While Tables 12-14 show comparative results for each image over all frequencies, it is of some interests to see the performance of all models for individual frequencies. Due to the space limit, only results for four images have been chosen to be shown in FIGS. 27-30. Yet, the selection of the four images is carried out in a way to be in more favor of other models rather than of the proposed models. Specifically, one image is selected from each test set to have the worse performance by the proposed BGTCM in Table 12, i.e., ‘boat’, ‘CS’, and ‘B5’. In addition, the ‘lenna’ image, whose statistics has been well studied, is also selected.

As FIGS. 27-30 also show the value of the χ² scores and the KL divergence, it helps to compare BGTCM with the GG model for their overall performance. For example, for the image of ‘boat’ and by the measurement of the χ² score, BGTCM wins over the GG model for 9 frequencies and loses for 6 frequencies. Yet, a close look at the left panel of FIG. 27 shows that among the 15 ACs, there are 8 frequencies for which BGTCM has a dramatically lower χ² score, while for the other 7 frequencies, including 6 in favor of the GG model and 1 in favor of BGTCM, the χ² scores by BGTCM and the GG model are very close to each other. As a result, though in Table 12 we can only report that BGTCM wins 60% over the GG model for modeling the low frequencies of ‘boat’, FIG. 27 shows that BGTCM clearly outperforms the GG model for modeling ‘boat’ overall. Similar results can be seen for ‘B5’ while examining the detailed comparison between BGTCM and the GG model as shown in FIG. 30, which shows a clear win by BGTCM over the GG model while Table 12 only reports that BGTCM wins 67% over the GG model.

13 Conclusions to MTCM

Motivated by the need to improve modeling accuracy, especially for low frequency DCT coefficients, while having simplicity and practicality similar to those of the Laplacian model, the second part of this disclosure has extended the transparent composite model (TCM) concept disclosed in the first part of the disclosure (i.e., sections 1 to 7) by further separating DCT coefficients into multiple segments and modeling each segment by a different parametric distribution such as truncated Laplacian and geometric distributions, yielding a model dubbed a multiple segment TCM (MTCM). In the case of bi-segment TCMs, an efficient online algorithm has been developed for computing the maximum likelihood (ML) estimates of their parameters. In the case of general MTCMs based on truncated Laplacian and geometric distributions (referred to as MLTCM and MGTCM, respectively), a greedy algorithm has been further presented for determining a desired number of segments and for estimating other corresponding MTCM parameters. It has been shown that (1) the bi-segment TCM based on truncated Laplacian (BLTCM) and MLTCM derived by the greedy algorithm offer the best modeling accuracy for continuous DCT coefficients while having simplicity and practicality similar to those of Laplacian; and (2) the bi-segment TCM based on truncated geometric distribution (BGTCM) and MGTCM derived by the greedy algorithm offer the best modeling accuracy for discrete DCT coefficients while having simplicity and practicality similar to those of geometric distribution, thus making them a desirable choice for modeling continuous and discrete DCT coefficients (or other similar type of data) in real-world applications, respectively.

In accordance with an example embodiment, there is provided a non-transitory computer-readable medium containing instructions executable by a processor for performing any or all of the described methods.

In any or all of the described methods, the boxes or algorithm lines may represent events, steps, functions, processes, modules, state-based operations, etc. While some of the above examples have been described as occurring in a particular order, it will be appreciated by persons skilled in the art that some of the steps or processes may be performed in a different order provided that the result of the changed order of any given step will not prevent or impair the occurrence of subsequent steps. Furthermore, some of the messages or steps described above may be removed or combined in other embodiments, and some of the messages or steps described above may be separated into a number of sub-messages or sub-steps in other embodiments. Even further, some or all of the steps may be repeated, as necessary. Elements described as methods or steps similarly apply to systems or subcomponents, and vice-versa. Reference to such words as “sending” or “receiving” could be interchanged depending on the perspective of the particular device.

While some example embodiments have been described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that some example embodiments are also directed to the various components for performing at least some of the aspects and features of the described processes, be it by way of hardware components, software or any combination of the two, or in any other manner. Moreover, some example embodiments are also directed to a pre-recorded storage device or other similar computer-readable medium including program instructions stored thereon for performing the processes described herein. The computer-readable medium includes any non-transient storage medium, such as RAM, ROM, flash memory, compact discs, USB sticks, DVDs, HD-DVDs, or any other such computer-readable memory devices.

Although not specifically illustrated, it will be understood that the devices described herein include one or more processors and associated memory. The memory may include one or more application program, modules, or other programming constructs containing computer-executable instructions that, when executed by the one or more processors, implement the methods or processes described herein.

The various embodiments presented above are merely examples and are in no way meant to limit the scope of this disclosure. Variations of the innovations described herein will be apparent to persons of ordinary skill in the art, such variations being within the intended scope of the present disclosure. In particular, features from one or more of the above-described embodiments may be selected to create alternative embodiments comprised of a sub-combination of features which may not be explicitly described above. In addition, features from one or more of the above-described embodiments may be selected and combined to create alternative embodiments comprised of a combination of features which may not be explicitly described above. Features suitable for such combinations and sub-combinations would be readily apparent to persons skilled in the art upon review of the present disclosure as a whole. The subject matter described herein intends to cover and embrace all suitable changes in technology.

All patent references and publications described or referenced herein are hereby incorporated by reference in their entirety into the Detailed Description of Example Embodiments.

References

[1] T. Acharya and A. K. Ray, Image Processing—Principles and Applications, Wiley InterScience, 2006.

[2] F. Muller, “Distribution shape of two-dimensional DCT coefficients of natural images”, Electronics Letters, Vol. 29, No. 22, October 1993.

[3] G. J. Sullivan; J.-R. Ohm; W.-J. Han; T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, December 2012.

[4] T. Eude, R. Grisel, H. Cherifi, and R. Debrie, “On the Distribution of The DCT Coefficients”, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 5, pp. 365-368, April 1994.

[5] A. Briassouli, P. Tsakalides, and A. Stouraitis, “Hidden Messages in Heavy-tails: DCT-Domain Watermark Detection using Alpha-Stable Models”, IEEE Transactions on Multimedia, Vol. 7, No. 4, August 2005.

[6] M. I. H. Bhuiyan, M. O. Ahmad, and M. N. S. Swamy, “Modeling of the DCT Coefficients of Images”, Proc. of IEEE International Symposium on Circuits and Systems, 2008, pp. 272-275.

[7] N. Kamaci, Y. Altunbasak, and R. Mersereau, “Frame Bit Allocation for the H.264/AVC Video Coder Via Cauchy-Density-Based Rate and Distortion Models”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, No. 8, August 2005.

[8] S. Nadarajah, “Gaussian DCT Coefficient Models”, Acta App. Math, 2009, 106: 455-472.

[9] E. Lam, and J. Goodman, “A Mathematical Analysis of the DCT Coefficient Distributions for Images” IEEE Transactions on Image Processing, Vol. 9, No. 10, October 2000.

[10] I-M. Pao, and M. T. Sun, “Modeling DCT Coefficients for Fast Video Encoding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 4, pp 608-616, June 1999.

[11] Rice, John (1995). Mathematical Statistics and Data Analysis (Second ed.). Duxbury Press.

[12] G. Sullivan, “Efficient scalar quantization of exponential and Laplacian random variables”, IEEE Transactions on Information Theory, vol. 42, no. 5, pp. 1365-1374, 1996.

[13] Minh Do, and M. Vetterli, “Wavelet-based Texture Retrieval using Generalized Gaussian Density and Kullback-Leibler Distance”, IEEE Transactions on Image Processing, Vol. 11, No. 2, February 2002.

[14] C. Tu, J. Liang, and T. Tran, “Adaptive runlength coding,” IEEE Signal Processing Letters, Vol. 10, No. 3, pp. 61-64, March 1997.

[15] C. Tu and T. Tran, “Context-based entropy coding of block transform coefficients for image compression,” IEEE Transactions on Image Processing, Vol. 11, No. 11, pp. 1271-1283, November 2002.

[16] X. Wu and N. Memon, “Context-based, adaptive, lossless image coding,” IEEE Transactions on Communications, Vol. 45, pp. 437-444, April 1997. 

The invention claimed is:
 1. A method for modelling a set of transform coefficients, the method being performed by a device and comprising: determining at least one boundary coefficient value; determining one or more parameters for a first distribution model for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values; determining parameters for at least one further distribution model for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values; and performing a device operation on at least part of a composite distribution model which is a composite of the first distribution model and the at least one further distribution model having the respective determined parameters.
 2. A method as claimed in claim 1, wherein said determining at least one boundary coefficient value includes determining the at least one boundary coefficient value which satisfies a maximum likelihood estimation between the set of transform coefficients and the composite distribution model.
 3. A method as claimed in claim 1, wherein the at least one further distribution model includes at least one parametric distribution model.
 4. A method as claimed in claim 3, wherein the at least one parametric distribution model includes at least one of: a truncated Laplacian distribution model, a truncated generalized Gaussian model, and a truncated geometric distribution model.
 5. A method as claimed in claim 3, wherein determining parameters for the at least one further parametric distribution model includes determining a probability for the at least one further parametric distribution model and the parameters of the at least one further parametric distribution model itself.
 6. A method as claimed in claim 1, wherein the first distribution model includes a uniform distribution model.
 7. A method as claimed in claim 1, wherein the first distribution model includes a parametric distribution model.
 8. A method as claimed in claim 1, wherein the composite distribution model is continuous and wherein the set of transform coefficients is continuous.
 9. A method as claimed in claim 1, wherein the composite distribution model is discrete and wherein the set of transform coefficients is discrete.
 10. A method as claimed in claim 1, wherein a candidate for each of the determined at least one boundary coefficient value is selected from the set of transform coefficients.
 11. A method as claimed in claim 1, wherein the device operation includes at least one of: storing on a memory, transmitting to a second device, transmitting to a network, outputting to an output device, displaying on a display screen, determining image similarity between different images by comparing at least part of the composite distribution model, determining a goodness-of-fit between the composite distribution model and the set of transform coefficients, and generating an identifier which associates the composite distribution model with the set of discrete transform coefficients.
 12. A method as claimed in claim 1, wherein the device operation includes at least one of: quantizing at least some of the set of transform coefficients based on the at least part of a composite distribution model; and performing lossless or lossy encoding on at least some of the set of transform coefficients based on the at least part of a composite distribution model.
 13. A method as claimed in claim 1, wherein the device operation includes performing a device function on at least one of the subsets of transform coefficients which are each bounded by the at least one boundary coefficient value.
 14. A method as claimed in claim 1, wherein said at least one boundary coefficient value includes at least two boundary coefficient values, and wherein the at least one further distribution model includes at least two further distribution models.
 15. A method as claimed in claim 1, wherein the set of transform coefficients includes: discrete cosine transform coefficients, Laplace transform coefficients, Fourier transform coefficients, wavelet transform coefficients, or prediction residuals arising from prediction.
 16. A method as claimed in claim 1, wherein the set of transform coefficients is stored in a memory of the device, or stored on a second device, or generated from a source media data.
 17. A method as claimed in claim 1, wherein the at least one further distribution model is according to a truncated probability density function ${\frac{b}{{2{F\left( y_{c} \middle| \theta \right)}} - 1}{f\left( y \middle| \theta \right)}},{{y} < y_{c}},$ wherein the first distribution is according to a uniform distribution function $\frac{1 - b}{2\left( {a - y_{c}} \right)},{{y} > y_{c}},$ wherein y_(c) is one of the boundary coefficient values, f(y|θ) is a probability density function with parameters θ∈Θ, where θ is a vector, and Θ is a parameter space, F(y|θ) is a corresponding cumulative density function to f(y|θ), b is a probability parameter, and a represents the largest magnitude a sample y can take.
 18. A method as claimed in claim 1, wherein the at least one further distribution model is according to a truncated Laplacian probability density function ${\frac{b}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda}}\frac{1}{2\lambda}{\mathbb{e}}^{{- {y}}/\lambda}},{{y} < y_{c}},$ wherein the first distribution is according to a uniform distribution function $\frac{1 - b}{2\left( {a - y_{c}} \right)},{y_{c} < {y} \leq a},$ wherein y_(c) is one of the boundary coefficient values, b is a probability parameter, λ is the parameter of the at least one further distribution model, and a represents the largest magnitude a sample y can take.
 19. A method as claimed in claim 18, further comprising determining λ_(y) _(c) as a maximum likelihood estimate of λ in the truncated Laplacian probability density function for the sample set {Y_(i):i∈S} according to: computing ${C = {\frac{1}{S}{\sum\limits_{i \in S}{Y_{i}}}}};$ for i≧1, computing ${\lambda_{i} = {C + \frac{y_{c} \cdot {\mathbb{e}}^{{- y_{c}}/\lambda_{i - 1}}}{1 - {\mathbb{e}}^{{- y_{c}}/\lambda_{i - 1}}}}},$ for i=1, 2, . . . until λ_(i)−λ_(i−1)<ε, where ε>0 is a prescribed threshold, and λ₀=C; and determining a final λ_(i) as an approximation for λ_(y) _(c) .
 20. A method as claimed in claim 1, wherein the composite model is according to a discrete distribution function: $\quad\left\{ \begin{matrix} {p_{0} = {b\; p}} & \; \\ {p_{i} = {{b\left( {1 - p} \right)}{\frac{1}{2}\left\lbrack {1 - {\mathbb{e}}^{- \frac{q}{\lambda}}} \right\rbrack}{{\mathbb{e}}^{{- \frac{q}{\lambda}}{({{i} - 1})}}/\left( {1 - {\mathbb{e}}^{{- \frac{q}{\lambda}}K}} \right)}}} & {{{{if}\mspace{14mu} i} = {\pm 1}},{\pm 2},\ldots\mspace{14mu},{\pm K}} \\ {p_{i} = \frac{1 - b}{2\left( {a - K} \right)}} & {{{if}\mspace{14mu} K} < {i} \leq a} \end{matrix} \right.$ wherein 1≦K≦a, b is a probability parameter, q is a step size, a is the largest index in a given sequence of discrete transform indices, p,λ and K are model parameters, and K is one of the boundary coefficient values.
 21. A method as claimed in claim 20, further comprising determining λ_(K) as a maximum likelihood estimate of λ, for a sequence of indices u^(n)=u₁, u₂, . . . , u_(n), and N₁(K)={j:0<|u_(j)|≦K}, according to: computing ${C = {\frac{1}{{N_{1}(K)}}{\sum\limits_{j \in {N_{1}{(K)}}}\left( {{u_{j}} - 1} \right)}}};$ initializing ${\lambda^{(0)} = \frac{q}{\ln\frac{1 + C}{C}}};$ for i≧1, computing $\quad\left\{ \begin{matrix} {C_{i} = {C + \frac{K}{{\mathbb{e}}^{{K\;{q/\lambda^{({i - 1})}}} - 1}}}} \\ {\lambda^{(i)} = \frac{q}{\ln\frac{1 + C_{i}}{C_{i}}}} \end{matrix} \right.$ for i=1, 2, . . . until λ^((i))−λ^((i−)1)<∈, where ∈>0 is a prescribed threshold; and determining a final λ^((i)) as an approximation for λ_(K).
 22. A method as claimed in claim 1, wherein the composite model is according to a probability density function: ${p\left( {\left. y \middle| {\overset{\_}{y}}_{c} \right.,\overset{\_}{b},\overset{\_}{\theta}} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} {\frac{b_{1}}{{2{F\left( y_{c_{1}} \middle| \theta_{1} \right)}} - 1}{f\left( y \middle| \theta_{1} \right)}} & {{{if}\mspace{14mu}{y}} < y_{c_{1}}} \\ {\frac{b_{2}}{2\left\lbrack {{F\left( y_{c_{2}} \middle| \theta_{2} \right)} - {F\left( y_{c_{1}} \middle| \theta_{2} \right)}} \right\rbrack}{f\left( y \middle| \theta_{2} \right)}} & {{{if}\mspace{14mu} y_{c_{1}}} < {y} < y_{c_{2}}} \\ \vdots & \; \\ {{\frac{b_{l}}{2\left\lbrack {{F\left( y_{c_{l}} \middle| \theta_{l} \right)} - {F\left( y_{c_{l - 1}} \middle| \theta_{l} \right)}} \right\rbrack}{f\left( y \middle| \theta_{l} \right)}},} & {{{if}\mspace{14mu} y_{c_{l - 1}}} < {y} < y_{c_{l}}} \\ {{\frac{b_{l + 1}}{2\left\lbrack {{F\left( a \middle| \theta_{l + 1} \right)} - {F\left( y_{c_{l}} \middle| \theta_{l + 1} \right)}} \right\rbrack}{f\left( y \middle| \theta_{l + 1} \right)}},} & {{{if}\mspace{14mu} y_{c_{l}}} < {y} \leq a} \\ 0 & {otherwise} \end{matrix} \right.$ wherein l is the number of further distribution models included in the at least one further distribution model, y _(c)=(y_(c) ₁ , y_(c) ₂ , . . . , y_(c) _(l) ) represent the at least one boundary coefficient value, f(y|θ) is a probability density function with parameters θ∈Θ, where θ is a vector, and Θ is a parameter space, F(y|θ) is a corresponding cumulative density function to f(y|θ), b=(b₁, b₂, . . . , b_(l+1)) is a probability vector with b_(l+1) representing the probability for the first distribution model and each b_(i), 1≦i≦l, representing the probability for the ith further distribution model, θ=(θ₁, θ₂, . . . , θ_(l+1)) with θ_(l+1) being the parameter of the first distribution model and each θ_(i), 1≦i≦l, being the parameter of the ith further distribution model, and a represents the largest magnitude a sample y can take.
 23. A method as claimed in claim 1, wherein the composite model is according to a discrete distribution function: ${p_{i}\left( {\overset{\_}{K},\overset{\_}{\lambda},\overset{\_}{b}} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} b_{0} & {{{if}\mspace{14mu} i} = 0} \\ {\frac{b_{1}}{2}{\mathbb{e}}^{{- \frac{q}{\lambda_{1}}}{({{i} - 1})}}\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda_{1}}}}{1 - {\mathbb{e}}^{{- \frac{q}{\lambda_{1}}}K_{1}}}} & {{{if}\mspace{14mu} 0} < {i} \leq K_{1}} \\ {\frac{b_{2}}{2}{\mathbb{e}}^{{- \frac{q}{\lambda_{2}}}{({{i} - K_{1} - 1})}}\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda_{2}}}}{1 - {\mathbb{e}}^{- {\frac{q}{\lambda_{2}}{\lbrack{K_{2} - K_{1}}\rbrack}}}}} & {{{if}\mspace{14mu} K_{1}} < {i} \leq K_{2}} \\ \vdots & \; \\ {\frac{b_{l + 1}}{2}{\mathbb{e}}^{{- \frac{q}{\lambda_{l + 1}}}{({{i} - K_{l} - 1})}}\frac{1 - {\mathbb{e}}^{- \frac{q}{\lambda_{l + 1}}}}{1 - {\mathbb{e}}^{- {\frac{q}{\lambda_{l + 1}}{\lbrack{a - K_{l}}\rbrack}}}}} & {{{if}\mspace{14mu} K_{l}} < {i} \leq a} \\ 0 & {otherwise} \end{matrix} \right.$ where K=(K₁, . . . , K_(l)) with K₀=0<K₁<K₂< . . . <K_(l)<K_(l+1)=a, λ=(λ₁, . . . , λ_(l), λ_(l+1)) with λ_(i)≧0, b=(b₀, b₁, . . . , b_(l+1)) with b_(i)≦0 and b₀+b₁+ . . . +b_(l+1)=1, wherein b is a probability vector parameter, l is the number of further distribution models included in the at least one further distribution model, q is a step size, a is the largest index in a given sequence of discrete transform indices, K, λ, and b are model parameters, and K represents the boundary coefficient values.
 24. A method as claimed in claim 23, further comprising estimating K₁, b₁, and λ₁ using a greedy algorithm by: initializing K₁; recursively growing the segment bounded by K₁ by padding each sample immediately adjacent to that segment into that segment until that sample and that segment are deemed to come from different models; selecting K₁ as the magnitude of the second last padded sample; and determining b₁ and λ₁ based on all samples falling within the segement bounded by K₁.
 25. A method as claimed in claim 24, further comprising determining l and the remaining values in K, λ, and b by repeatedly applying said greedy algorithm to the remaining samples which do not fall within segments bounded by all K_(i) determined already until there are no more remaining samples.
 26. A method as claimed in claim 1, wherein the composite model is according to a probability density function: ${p\left( {\left. y \middle| {\overset{\_}{y}}_{c} \right.,\overset{\_}{b},\overset{\_}{\lambda}} \right)}\overset{\Delta}{=}\left\{ \begin{matrix} {\frac{b_{1}}{1 - {\mathbb{e}}^{- y_{c_{1}/\lambda_{1}}}}\frac{1}{2\lambda_{1}}{\mathbb{e}}^{- \frac{y}{\lambda_{1}}}} & {{{if}\mspace{14mu}{y}} < y_{c_{1}}} \\ {\frac{b_{2}}{1 - {\mathbb{e}}^{{- {({y_{c_{2}} - y_{c_{1}}})}}/\lambda_{2}}}\frac{1}{2\lambda_{2}}{\mathbb{e}}^{- \frac{{y} - y_{c_{1}}}{\lambda_{2}}}} & {{{if}\mspace{14mu} y_{c_{1}}} < {y} < y_{c_{2}}} \\ \vdots & \; \\ {\frac{b_{l + 1}}{1 - {\mathbb{e}}^{{- {({y_{c_{l + 1}} - y_{c_{l}}})}}/\lambda_{l + 1}}}\frac{1}{2\lambda_{l + 1}}{\mathbb{e}}^{- \frac{{y} - y_{c_{l}}}{\lambda_{l + 1}}}} & {{{if}\mspace{14mu} y_{c_{l}}} < {y} \leq a} \\ 0 & {{otherwise}.} \end{matrix} \right.$ wherein 1 is the number of further distribution models included in the at least one further distribution model, y _(c)=(y_(c) ₁ , y_(c) ₂ , . . . , y_(c) ₁ ) represents the at least one boundary coefficient value, b=(b₁, b₂, . . . b_(l+1)) is a probability vector with b_(l+1) representing the probability for the first distribution model and each b_(i), 1≦i≦l, representing the probability for the ith further distribution mode with b_(i)≦0 and b₁+b₂+ . . . +b_(l)+b_(l+1)=1, λ=(λ₁, λ₂, . . . , λ_(l+1)) with λ_(l+1) being the parameter of the first distribution model and each λ_(i), 1≦i≦l, being the parameter of the ith further distribution model with λ_(i)>0, i=1, 2, . . . , l+1, and a represents the largest magnitude a sample y can take.
 27. A method as claimed in claim 26, further comprising estimating y_(c) ₁ , b₁, and λ₁ using a greedy algorithm by: initializing y_(c) ₁ ; recursively growing the segment bounded by y_(c) ₁ by padding each sample immediately adjacent to that segment into that segment until that sample and that segment are deemed to come from different models; selecting y_(c) ₁ as the magnitude of the second last padded sample; and determining b₁ and λ₁ based on all samples falling within the segement bounded by y_(c) ₁ .
 28. A method as claimed in claim 27, further comprising determining l and the remaining values in y _(c), b, and λ by repeatedly applying said greedy algorithm to the remaining samples which do not fall within segments bounded by all y_(c) _(i) already until there are no more remaining samples.
 29. A device, comprising: memory; a component configured to access a set of transform coefficients; and a processor configured to execute instructions stored in the memory in order to: determine at least one boundary coefficient value, determine one or more parameters for a first distribution model for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values, determine parameters for at least one further distribution model for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values, and perform a device operation on at least part of a composite distribution model which is a composite of the first distribution model and the at least one further distribution model having the respective determined parameters.
 30. A non-transitory computer readable medium containing instructions executable by a processor of a device for a set of transform coefficients, the instructions comprising: instructions for determining at least one boundary coefficient value; instructions for determining one or more parameters for a first distribution model for transform coefficients of the set the magnitudes of which are greater than one of the boundary coefficient values; instructions for determining parameters for at least one further distribution model for transform coefficients of the set the magnitudes of which are less than the one of the boundary coefficient values; and instructions for performing a device operation on at least part of a composite distribution model which is a composite of the first distribution model and the at least one further distribution model having the respective determined parameters. 